Skip to content Skip to sidebar Skip to footer

Review of Software Testing Techniques by Lu Luo

Inquiry

Prediction models for diagnosis and prognosis of covid-nineteen: systematic review and critical appraisal

BMJ 2020; 369 doi: https://doi.org/10.1136/bmj.m1328 (Published 07 April 2020) Cite this as: BMJ 2020;369:m1328

Linked Editorial

Prediction models for diagnosis and prognosis in covid-xix

Read our latest coverage of the coronavirus outbreak

Loading

  1. Laure Wynants , assistant professor1 ii,
  2. Ben Van Calster , associate professor2 three,
  3. Gary S Collins , professor4 v,
  4. Richard D Riley , professorhalf-dozen,
  5. Georg Heinze , acquaintance professor7,
  6. Ewoud Schuit , assistant professor8 9,
  7. Marc M J Bonten , professorviii 10,
  8. Darren L Dahly , main statistician11 12,
  9. Johanna A Damen , assistant professoreight nine,
  10. Thomas P A Debray , assistant professor8 9,
  11. Valentijn M T de Jong , banana professorviii 9,
  12. Maarten De Vos , acquaintance professor2 13,
  13. Paula Dhiman , inquiry fellow4 5,
  14. Maria C Haller , medical doctor7 14,
  15. Michael O Harhay , banana professor15 16,
  16. Liesbet Henckaerts , assistant professor17 eighteen,
  17. Pauline Heus , assistant professoreight 9,
  18. Michael Kammer , research associate7 19,
  19. Nina Kreuzberger , research associate20,
  20. Anna Lohmann , researcher in training21,
  21. Kim Luijken , doctoral candidate21,
  22. Jie Ma , medical statistician5,
  23. Glen P Martin , lecturer22,
  24. David J McLernon , senior research beau23,
  25. Constanza L Andaur Navarro , doctoral student8 9,
  26. Johannes B Reitsma , acquaintance professor8 9,
  27. Jamie C Sergeant , senior lecturer24 25,
  28. Chunhu Shi , inquiry associate26,
  29. Nicole Skoetz , medical doctorxix,
  30. Luc J M Smits , professorone,
  31. Kym I East Snell , lecturer6,
  32. Matthew Sperrin , senior lecturer27,
  33. René Spijker , information specialist8 nine 28,
  34. Ewout West Steyerberg , professorthree,
  35. Toshihiko Takada , assistant professor8,
  36. Ioanna Tzoulaki , assistant professor29 30,
  37. Sander M J van Kuijk , enquiry fellow31,
  38. Bas C T van Bussel , medical doc1 32,
  39. Iwan C C van der Horst , professor32,
  40. Florien Due south van Royen , research fellow8,
  41. Jan Y Verbakel , assistant professor33 34,
  42. Christine Wallisch , research fellowseven 35 36,
  43. Jack Wilkinson , inquiry swain22,
  44. Robert Wolff , medical medico37,
  45. Lotty Hooft , acquaintance professor8 9,
  46. Karel Thousand Thousand Moons , professor8 9,
  47. Maarten van Smeden , assistant professor8
  1. 1Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Peter Debyeplein 1, 6229 HA Maastricht, Netherlands
  2. 2Section of Development and Regeneration, KU Leuven, Leuven, Belgium
  3. 3Department of Biomedical Information Sciences, Leiden University Medical Centre, Leiden, Netherlands
  4. 4Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Musculoskeletal Sciences, University of Oxford, Oxford, Great britain
  5. 5NIHR Oxford Biomedical Enquiry Heart, John Radcliffe Hospital, Oxford, UK
  6. 6Centre for Prognosis Research, Schoolhouse of Primary, Community and Social Care, Keele Academy, Keele, UK
  7. 7Section for Clinical Biometrics, Center for Medical Statistics, Information science and Intelligent Systems, Medical University of Vienna, Vienna, Austria
  8. 8Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
  9. 9Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
  10. 10Department of Medical Microbiology, University Medical Centre Utrecht, Utrecht, Netherlands
  11. xiHRB Clinical Research Facility, Cork, Republic of ireland
  12. 12School of Public Health, University College Cork, Cork, Ireland
  13. 13Section of Electric Technology, ESAT Stadius, KU Leuven, Leuven, Belgium
  14. 14Ordensklinikum Linz, Hospital Elisabethinen, Department of Nephrology, Linz, Austria
  15. 15Department of Biostatistics, Epidemiology and Information science, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
  16. 16Palliative and Advanced Affliction Inquiry Middle and Segmentation of Pulmonary and Critical Intendance Medicine, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
  17. 17Department of Microbiology, Immunology and Transplantation, KU Leuven-University of Leuven, Leuven, Kingdom of belgium
  18. 18Department of Full general Internal Medicine, KU Leuven-Academy Hospitals Leuven, Leuven, Kingdom of belgium
  19. 19Section of Nephrology, Medical University of Vienna, Vienna, Austria
  20. 20Prove-Based Oncology, Department I of Internal Medicine and Middle for Integrated Oncology Aachen Bonn Cologne Dusseldorf, Kinesthesia of Medicine and Academy Infirmary Cologne, University of Cologne, Cologne, Deutschland
  21. 21Department of Clinical Epidemiology, Leiden Academy Medical Centre, Leiden, Netherlands
  22. 22Segmentation of Informatics, Imaging and Information Science, Faculty of Biology, Medicine and Health, Manchester Bookish Wellness Science Centre, University of Manchester, Manchester, UK
  23. 23Found of Applied Health Sciences, University of Aberdeen, Aberdeen, UK
  24. 24Centre for Biostatistics, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
  25. 25Centre for Epidemiology Versus Arthritis, Center for Musculoskeletal Research, University of Manchester, Manchester Academic Wellness Science Center, Manchester, Britain
  26. 26Segmentation of Nursing, Midwifery and Social Work, School of Health Sciences, University of Manchester, Manchester, UK
  27. 27Faculty of Biology, Medicine and Health, Academy of Manchester, Manchester, Great britain
  28. 28Amsterdam UMC, University of Amsterdam, Amsterdam Public Wellness, Medical Library, Netherlands
  29. 29Department of Epidemiology and Biostatistics, Majestic College London School of Public Health, London, U.k.
  30. 30Section of Hygiene and Epidemiology, Academy of Ioannina Medical School, Ioannina, Greece
  31. 31Department of Clinical Epidemiology and Medical Technology Cess, Maastricht University Medical Heart+, Maastricht, Netherlands
  32. 32Department of Intensive Care, Maastricht University Medical Centre+, Maastricht Academy, Maastricht, Netherlands
  33. 33EPI-Middle, Department of Public Health and Primary Care, KU Leuven, Leuven, Belgium
  34. 34Nuffield Department of Main Care Health Sciences, University of Oxford, Oxford, United kingdom of great britain and northern ireland
  35. 35Charité Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin, Germany
  36. 36Berlin Found of Health, Berlin, Germany
  37. 37Kleijnen Systematic Reviews, York, United kingdom of great britain and northern ireland
  1. Correspondence to: Fifty Wynants laure.wynants{at}maastrichtuniversity.nl
  • Accepted 31 March 2020
  • Final version accustomed 12 Jan 2021

Abstract

Objective To review and assess the validity and usefulness of published and preprint reports of prediction models for diagnosing coronavirus affliction 2019 (covid-nineteen) in patients with suspected infection, for prognosis of patients with covid-nineteen, and for detecting people in the general population at increased risk of covid-nineteen infection or being admitted to infirmary with the disease.

Design Living systematic review and critical appraisal by the COVID-PRECISE (Precise Risk Estimation to optimise covid-19 Intendance for Infected or Suspected patients in diverse sEttings) grouping.

Information sources PubMed and Embase through Ovid, up to 1 July 2020, supplemented with arXiv, medRxiv, and bioRxiv up to 5 May 2020.

Study selection Studies that developed or validated a multivariable covid-19 related prediction model.

Information extraction At least ii authors independently extracted data using the CHARMS (critical appraisal and data extraction for systematic reviews of prediction modelling studies) checklist; risk of bias was assessed using PROBAST (prediction model risk of bias assessment tool).

Results 37 421 titles were screened, and 169 studies describing 232 prediction models were included. The review identified seven models for identifying people at risk in the general population; 118 diagnostic models for detecting covid-19 (75 were based on medical imaging, 10 to diagnose disease severity); and 107 prognostic models for predicting bloodshed adventure, progression to astringent affliction, intensive care unit admission, ventilation, intubation, or length of hospital stay. The well-nigh frequent types of predictors included in the covid-xix prediction models are vital signs, age, comorbidities, and image features. Flu-similar symptoms are often predictive in diagnostic models, while sex activity, C reactive protein, and lymphocyte counts are frequent prognostic factors. Reported C index estimates from the strongest form of validation bachelor per model ranged from 0.71 to 0.99 in prediction models for the full general population, from 0.65 to more than 0.99 in diagnostic models, and from 0.54 to 0.99 in prognostic models. All models were rated at high or unclear adventure of bias, mostly because of not-representative selection of control patients, exclusion of patients who had non experienced the event of involvement by the end of the study, high hazard of model overfitting, and unclear reporting. Many models did not include a description of the target population (n=27, 12%) or care setting (due north=75, 32%), and only 11 (five%) were externally validated past a scale plot. The Jehi diagnostic model and the 4C mortality score were identified equally promising models.

Conclusion Prediction models for covid-19 are quickly entering the bookish literature to support medical decision making at a time when they are urgently needed. This review indicates that almost all pubished prediction models are poorly reported, and at loftier risk of bias such that their reported predictive functioning is probably optimistic. Withal, we have identified ii (one diagnostic and one prognostic) promising models that should soon exist validated in multiple cohorts, preferably through collaborative efforts and data sharing to also allow an investigation of the stability and heterogeneity in their performance across populations and settings. Details on all reviewed models are publicly bachelor at https://www.covprecise.org/. Methodological guidance equally provided in this paper should be followed because unreliable predictions could cause more damage than benefit in guiding clinical decisions. Finally, prediction model authors should adhere to the TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) reporting guideline.

Readers' annotation This article is a living systematic review that volition be updated to reflect emerging bear witness. Updates may occur for up to 2 years from the date of original publication. This version is update 3 of the original commodity published on vii Apr 2020 (BMJ 2020;369:m1328). Previous updates can be found as information supplements (https://www.bmj.com/content/369/bmj.m1328/related#datasupp). When citing this newspaper delight consider calculation the update number and date of access for clarity.

Introduction

The novel coronavirus disease 2019 (covid-nineteen) presents an important and urgent threat to global wellness. Since the outbreak in early December 2019 in the Hubei province of the People'south Democracy of China, the number of patients confirmed to have the disease has exceeded 47 million as the disease spread globally, and the number of people infected is probably much higher. More than 1.2 million people have died from covid-nineteen (up to 3 November 2020).one Despite public health responses aimed at containing the affliction and delaying the spread, several countries have been confronted with a critical intendance crisis, and more than countries could follow.234 Outbreaks lead to important increases in the demand for hospital beds and shortage of medical equipment, while medical staff themselves can too become infected. Several regions have had or are experiencing second waves, and despite improvements in testing and tracing, several regions are again facing the limits of their test capacity, hospital resource and healthcare staff.56

To mitigate the burden on the healthcare system, while also providing the best possible care for patients, efficient diagnosis and information on the prognosis of the disease are needed. Prediction models that combine several variables or features to guess the take a chance of people being infected or experiencing a poor upshot from the infection could aid medical staff in triaging patients when allocating express healthcare resources. Models ranging from dominion based scoring systems to advanced machine learning models (deep learning) take been proposed and published in response to a call to share relevant covid-xix inquiry findings rapidly and openly to inform the public health response and assistance save lives.7

We aimed to systematically review and critically appraise all currently available prediction models for covid-19, in item models to predict the run a risk of covid-19 infection or existence admitted to hospital with the illness, models to predict the presence of covid-19 in patients with suspected infection, and models to predict the prognosis or course of infection in patients with covid-19. We included model development and external validation studies. This living systematic review, with periodic updates, is beingness conducted past the international COVID-PRECISE (Precise Risk Estimation to optimise covid-nineteen Care for Infected or Suspected patients in diverse sEttings; https://world wide web.covprecise.org/) group in collaboration with the Cochrane Prognosis Methods Group.

Methods

We searched the publicly available, continuously updated publication listing of the covid-xix living systematic review.8 We validated whether the list is fit for purpose (online supplementary fabric) and further supplemented information technology with studies on covid-19 retrieved from arXiv. The online supplementary material presents the search strings. We included studies if they developed or validated a multivariable model or scoring system, based on individual participant level data, to predict any covid-19 related issue. These models included three types of prediction models: diagnostic models to predict the presence or severity of covid-19 in patients with suspected infection; prognostic models to predict the grade of infection in patients with covid-19; and prediction models to place people in the general population at risk of covid-19 infection or at adventure of being admitted to hospital with the disease.

Nosotros searched the database repeatedly up to 1 July 2020 (supplementary table i). Every bit of the third update (search date ane July), we only include peer reviewed articles (indexed in PubMed and Embase through Ovid). Preprints (from bioRxiv, medRxiv, and arXiv) that were already included in previous updates of the systematic review remain included in the assay. Reassessment takes place later publication of a preprint in a peer reviewed periodical. No restrictions were made on the setting (eg, inpatients, outpatients, or general population), prediction horizon (how far ahead the model predicts), included predictors, or outcomes. Epidemiological studies that aimed to model disease transmission or fatality rates, diagnostic test accurateness, and predictor finding studies were excluded. We focus on studies published in English. Starting with the second update, retrieved records were initially screened by a text analysis tool developed using bogus intelligence to prioritise sensitivity (supplementary material). Titles, abstracts, and full texts were screened for eligibility in indistinguishable past independent reviewers (pairs from LW, BVC, MvS) using EPPI-Reviewer,9 and discrepancies were resolved through word.

Data extraction of included articles was done by two independent reviewers (from LW, BVC, GSC, TPAD, MCH, GH, KGMM, RDR, ES, LJMS, EWS, KIES, CW, AL, JM, TT, JAAD, KL, JBR, LH, CS, MS, MCH, NS, NK, SMJvK, JCS, PD, CLAN, RW, GPM, IT, JYV, DLD, JW, FSvR, PH, VMTdJ, BCTvB, ICCvdH, DJM, MK, and MvS). Reviewers used a standardised data extraction form based on the CHARMS (disquisitional appraisal and data extraction for systematic reviews of prediction modelling studies) checklist10 and PROBAST (prediction model risk of bias assessment tool; world wide web.probast.org) for assessing the reported prediction models.xi We sought to extract each model's predictive performance by using whatever measures were presented. These measures included any summaries of discrimination (the extent to which predicted risks discriminate between participants with and without the outcome), and scale (the extent to which predicted risks stand for to observed risks) as recommended in the TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis; www.tripod-argument.org) statement.12 Discrimination is often quantified by the C alphabetize (C alphabetize=1 if the model discriminates perfectly; C index=0.5 if discrimination is no ameliorate than chance). Calibration is frequently quantified by the scale intercept (which is zip when the risks are not systematically overestimated or underestimated) and calibration slope (which is one if the predicted risks are non too farthermost or too moderate).thirteen We focused on functioning statistics equally estimated from the strongest available class of validation (in order of strength: external (evaluation in an independent database), internal (bootstrap validation, cantankerous validation, random grooming exam splits, temporal splits), apparent (evaluation past using exactly the same information used for development)). Any discrepancies in information extraction were discussed between reviewers, and remaining conflicts were resolved by LW or MvS. The online supplementary material provides details on data extraction. Some studies investigated multiple models and some models were investigated in multiple studies (that is, in external validation studies). The unit of assay was a model within a study, unless stated otherwise. We considered aspects of PRISMA (preferred reporting items for systematic reviews and meta-analyses)14 and TRIPOD12 in reporting our study. Details on all reviewed studies and prediction models are publicly available at https://www.covprecise.org/.

Results

We retrieved 37 412 titles through our systematic search (of which 23 203 were included in the present update; supplementary table one, fig one). We included a further 9 studies that were publicly available simply were not detected past our search. Of 37 421 titles, 444 studies were retained for abstract and total text screening (of which 169 are included in the present update). One hundred sixty nine studies describing 232 prediction models met the inclusion criteria (of which 62 studies and 87 models added since the present update, supplementary table 1).15161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183 These studies were selected for data extraction and critical appraisal. The unit of analysis was the model within a written report: of these 232 models, 208 were unique, newly developed models for covid-19. The remaining 24 analyses were external validations of existing models (in a study other than the model development study). Some models were validated more than than one time (in different studies, as described below). Many models are publicly available (box 1). A database with the clarification of each model and its gamble of bias assessment can be found on https://www.covprecise.org/.

Box 1

Availability of models in format for utilize in clinical practice

Two hundred and eight unique models were developed in the included studies. Thirty (14%) of these models were presented as a model equation including intercept and regression coefficients. Eight (four%) models were just partially presented (eg, intercept or baseline take a chance were missing). The remaining did not provide the underlying model equation.

Seventy two models (35%) are available as a tool for apply in clinical do (in addition to or instead of a published equation). Twenty seven models were presented as a web reckoner (13%), 12 as a sum score (6%), 11 as a nomogram (5%), 8 every bit a software object (4%), 5 every bit a conclusion tree or ready of predictions for subgroups (two%), 3 as a chart score (one%), and 6 in other usable formats (iii%).

All these presentation formats brand predictions readily available for apply in the clinic. Withal, because all models were at high or uncertain adventure of bias, nosotros practice not recommend their routine use before they are externally validated, ideally by contained investigators.

RETURN TO TEXT

Primary datasets

One hundred seventy four (75%) models used data from a single country (table i), 42 (18%) models used international data, and for 16 (7%) models information technology was unclear how many (and which) countries contributed data. Two (1%) models used imitation data and 12 (5%) used proxy information to estimate covid-19 related risks (eg, Medicare claims data from 2015 to 2016). Virtually models were intended for use in confirmed covid-19 cases (47%) and a infirmary setting (51%). The average patient age ranged from 39 to 71 years, and the proportion of men ranged from 35% to 75%, although this information was oft not reported. One study developed a prediction model for use in paediatric patients.27

Table ane

Characteristics of reviewed prediction models for diagnosis and prognosis of coronavirus disease 2019 (covid-19)

Based on the studies that reported study dates, data were collected from December 2019 to June 2020. Some centres provided data to multiple studies and several studies used open Github184 or Kaggle185 data repositories (version or date of admission often unspecified), and so it was unclear how much these datasets overlapped across our identified studies.

Amid the diagnostic model studies, the reported prevalence of covid-19 varied between seven% and 71% (if a cantankerous sectional or cohort design was used). Because 75 diagnostic studies used either case-control sampling or an unclear method of information collection, the prevalence in these diagnostic studies might non be representative of their target population.

Amongst the studies that developed prognostic models to predict mortality run a risk in people with confirmed or suspected infection, the per centum of deaths ranged from i% to 52%. This wide variation is partly considering of substantial sampling bias acquired by studies excluding participants who still had the disease at the cease of the study period (that is, they had neither recovered nor died). Additionally, length of follow-upward varied between studies (but was ofttimes not reported), and in that location is likely to be local and temporal variation in how people were diagnosed every bit having covid-19 or were admitted to the hospital (and therefore recruited for the studies).

Models to predict risk of covid-19 in the general population

We identified 7 models that predicted chance of covid-19 in the general population. Three models from one report used hospital admission for non-tuberculosis pneumonia, influenza, acute bronchitis, or upper respiratory tract infections as proxy outcomes in a dataset without any patients with covid-19.16 Among the predictors were age, sex activity, previous hospital admission, comorbidities, and social determinants of wellness. The study reported C indices of 0.73, 0.81, and 0.81. A 4th model used deep learning on thermal videos from the faces of people wearing facemasks to determine abnormal breathing (not covid related) with a reported sensitivity of 80%.92 A 5th model used demographics, symptoms, and contact history in a mobile app to assistance general practitioners in collecting data and to risk-stratify patients. It was assorted with two farther models that included boosted blood values and blood values plus computed tomography (CT) images. The authors reported a C index of 0.71 with demographics just, which rose to 0.97 and 0.99 equally blood values and imaging characteristics were added.151 Calibration was not assessed in any of the general population models.

Diagnostic models to observe covid-19 in patients with suspected infection

We identified 33 multivariable models to distinguish between patients with and without covid-nineteen. Most models targeted patients with suspected covid-19. Reported C index values ranged betwixt 0.65 and 0.99. Calibration was assessed for vii models using scale plots (including two at external validation), with mixed results. The most ofttimes included predictors (≥x times) were vital signs (eg, temperature, centre rate, respiratory rate, oxygen saturation, blood pressure), flu-like signs and symptoms (eg, shiver, fatigue), age, electrolytes, prototype features (eg, pneumonia signs on CT scan), contact with individuals with confirmed covid-19, lymphocyte count, neutrophil count, cough or sputum, sex, leukocytes, liver enzymes, and ruddy prison cell distribution width.

X studies aimed to diagnose severe disease in patients with covid-19: ix in adults with reported C indices betwixt value of 0.lxxx and 0.99, and i in children that reported perfect classification of severe illness.27 Scale was not assessed in whatsoever of the models. Predictors of severe covid-xix used more than than in one case were comorbidities, liver enzymes, C reactive protein, imaging features, lymphocyte count, and neutrophil count.

Seventy five prediction models were proposed to support the diagnosis of covid-19 or covid-19 pneumonia (and some also to monitor progression) based on images. Most studies used CT images or breast radiographs. Others used spectrograms of cough sounds55 and lung ultrasound.75 The predictive performance varied considerably, with reported C index values ranging from 0.70 to more than than 0.99. Only 1 model based on imaging was evaluated by employ of a calibration plot, and information technology appeared to be well calibrated at external validation.186

Prognostic models for patients with diagnosis of covid-19

We identified 107 prognostic models for patients with a diagnosis of covid-19. The intended employ of these models (that is, when to use them, and for whom) was ofttimes not clearly described. Prediction horizons varied betwixt one and 37 days, but were ofttimes unspecified.

Of these models, 39 estimated mortality take a chance and 28 aimed to predict progression to a severe or disquisitional disease. The remaining studies used other outcomes (single or equally role of a composite) including recovery, length of hospital stay, intensive intendance unit access, intubation, (duration of) mechanical ventilation, acute respiratory distress syndrome, cardiac injury and thrombotic complication. One report used information from 2015 to 2019 to predict bloodshed and prolonged assisted mechanical ventilation (equally a non-covid-xix proxy consequence).115 The most frequently used categories of prognostic factors (for whatever outcome, included at least 20 times) included age, comorbidities, vital signs, epitome features, sex, lymphocyte count, and C reactive poly peptide.

Studies that predicted mortality reported C indices between 0.68 and 0.98. Four studies also presented calibration plots (including at external validation for three models), all indicating miscalibration1569118 or showing plots for integer scores without conspicuously explaining how these were translated into predicted risks.143 The studies that adult models to predict progression to a severe or critical illness reported C indices between 0.58 and 0.99. Five of these models likewise were evaluated by calibration plots, two of them at external validation. Even though scale appeared good, plots were constructed in an unclear manner.85121 Reported C indices for other outcomes varied between 0.54 (admission to intensive care) and 0.99 (severe symptoms three days after access), and five models had calibration plots (of which three at external validation), with mixed results.

Risk of bias

All models were at high (northward=226, 97%) or unclear (n=6, 3%) risk of bias according to assessment with PROBAST, which suggests that their predictive performance when used in practise is probably lower than that reported (fig two). Therefore, we accept cause for business that the predictions of the proposed models are unreliable when used in other people. Figure ii and box 2 gives details on common causes for risk of bias for each blazon of model.

Box 2

Common causes of risk of bias in the reported prediction models

Models to predict coronavirus disease 2019 (covid-19) take a chance in general population

All of these models had unclear or high take chances of bias for the participant, outcome, and assay domain. All were based on proxy outcomes to predict covid-19 related risks, such as presence of or hospital admission due to astringent respiratory disease, in the absence of information of patients with covid-nineteen.1692151

Diagnostic models

Ten models (30%) used inappropriate data sources (eg, due to a not-nested case-control pattern), nine (27%) used inappropriate inclusion or exclusion criteria such that the study information was not representative of the target population, and eight (24%) selected controls that were not representative of the target population for a diagnostic model (eg, controls for a screening model had viral pneumonia). Other frequent bug were dichotomisation of predictors (9 models, 27%), and tests used to determine the outcome (eight models, 24%) or predictor definitions or measurement procedures (vii models, 21%) that varied between participants.

Diagnostic models based for severity classification

Ii models (20%) used predictor data that was assessed while the severity (the consequence) was known. Other concerns include non-standard or lack of a prespecified effect definition (two models, 20%), predictor measurements (eg, fever) being part of the outcome definition (two models, twenty%) and outcomes existence assessed with cognition of predictor measurements (two models, xx%).

Diagnostic models based on medical imaging

Generally, studies did not clearly written report which patients had imaging during clinical routine. Fifty five (73%) used an inappropriate or unclear study design to collect information (eg, a non-nested instance-control). Information technology was oft unclear (39 models, 52%) whether the selection of controls was made from the target population (that is, patients with suspected covid-nineteen). Outcome definitions were frequently not defined or determined in the aforementioned mode in all participants (18 models, 24%). Diagnostic model studies that used medical images every bit predictors were all scored as unclear on the predictor domain. These publications often lacked clear information on the preprocessing steps (eg, cropping of images). Moreover, complex motorcar learning algorithms transform images into predictors in a circuitous way, which makes it challenging to fully apply the PROBAST predictors section for such imaging studies. However, a more favourable cess of the predictor domain does not lead to better overall judgment regarding take chances of bias for the included models. Careful description of model specification and subsequent estimation were ofttimes defective, challenging the transparency and reproducibility of the models. Studies used dissimilar deep learning architectures, some were established and others specifically designed, without benchmarking the used compages against others.

Prognostic models

Dichotomisation of predictors was a frequent concern (22 models, 21%). Other problems include inappropriate inclusions or exclusions of study participants (18 models, 17%). Study participants were often excluded because they did not develop the outcome at the end of the study catamenia but were notwithstanding in follow-up (that is, they were in hospital just had not recovered or died), yielding a selected study sample (12 models, eleven%). Additionally, many models (16 models, 15%) did not business relationship for censoring or competing risks.

Return TO TEXT

Ninety eight models (42%) had a high risk of bias for the participants domain, which indicates that the participants enrolled in the studies might not exist representative of the models' targeted populations. Unclear reporting on the inclusion of participants led to an unclear risk of bias assessment in 58 models (25%), and 76 (33%) had a low risk of bias for the participants domain. Fifteen models (6%) had a loftier take chances of bias for the predictor domain, which indicates that predictors were not available at the models' intended time of use, not clearly defined, or influenced by the outcome measurement. Ane hundred and thirty five (58%) models were rated unclear and 82 (35%) rated at depression adventure of bias for the predictor domain. Nigh studies used outcomes that are piece of cake to assess (eg, death, presence of covid-19 by laboratory confirmation), and hence 95 (41%) were rated at low risk of bias. However, there was cause for business concern well-nigh bias induced by the effect measurement in 50 models (22%), for instance, due to the utilize of subjective or proxy outcomes (eg, non-covid-19 severe respiratory infections). 80 seven models (38%) had an unclear gamble of bias due to opaque or ambiguous reporting. Two hundred and eighteen (94%) models were at loftier take chances of bias for the assay domain. The reporting was insufficiently clear to appraise chance of bias in the analysis in thirteen studies (six%). Only one model had a low risk of bias for the analysis domain (<1%). Twenty 9 (13%) models had low risk of bias on all domains except analysis, indicating adequate data collection and study pattern, but issues that could have been avoided by conducting a ameliorate statistical analysis. Many studies had small to pocket-size sample sizes (tabular array ane), which led to an increased chance of overfitting, particularly if complex modelling strategies were used. In addition, 50 models (22%) were neither internally nor externally validated. Performance statistics calculated on the development information from these models are likely optimistic. Calibration was only assessed for 22 models using calibration plots (x%), of which xi on external validation information.

Nosotros found two models that were mostly of practiced quality, congenital on large datasets, and had been rated depression risk of bias on virtually domains but with an overall rating of unclear run a risk of bias, owing to unclear details on one signalling question inside the assay domain (table 2 provides a summary). Jehi and colleagues presented findings from developing a diagnostic model, withal, there was substantial missing data and information technology remains unclear whether the utilize of median imputation influenced results, and there are unexplained discrepancies between the online calculator, nomogram, and published logistic regression model.141 Hence, the calculator should not exist used without further validation. Knight and colleagues adult a prognostic model for in-hospital mortality, however, continuous predictors were dichotomised, which reduces granularity of predicted risks (fifty-fifty though the model had a C alphabetize comparable with that of a generalised condiment model).143 The model was also converted into an sum score, but it was unclear how the scores were translated to the predicted mortality risks that were used to evaluate calibration.

Table 2

Prediction models with unclear take chances of bias overall and large development samples

External validation

Xl six models were developed and externally validated in the same report (in an independent dataset, excluding random training test splits and temporal splits). In improver, 24 external validations of models were developed for covid-19 or earlier the covid-19 pandemic in separate studies. All the same, none of the external validations was scored as low risk of bias, three were rated as unclear chance of bias, and 67 were rated as high gamble of bias. I common business is that datasets used for the external validation were likely not representative of the target population (eg, patients not being recruited consecutively, use of an inappropriate written report design, utilize of unrepresentative controls, exclusion of patients withal in follow-up). Consequently, predictive operation could differ if the models are applied in the targeted population. Moreover, merely 15 (21%) external validations had 100 or more events, which is the recommended minumum.187188 Only 11 (16%) external validations presented a scale plot.

Tabular array 3 shows the results of external validations that had at most an unclear hazard of bias and at least 100 events in the external validation gear up. The model past Jehi et al has been discussed in a higher place.141 Luo and colleagues performed a validation of the Adjourn-65 score, originally adult to predict mortality of customs acquired pneumonia, to assess its abilty to predict in-hospital mortality in patients with confirmed covid-nineteen. This validation was conducted in a big retrospective cohort of patients admitted to two Chinese designated hospitals to treat patients with pneumonia from SARS-CoV-two (severe astute respiratory syndrome coronavirus 2).155 Information technology was unclear whether all consecutive patients were included (although this is likely given the retrospective design), no calibration plot was used because the score gives an integer every bit output rather than estimates risks, and the score uses dichotomised predictors. Overall, the external validation past Luo et al was performed well. Studies that validated CURB-65 in patients with covid-19 obtained C indexes of 0.58, 0.74, 0.75, 0.84, and 0.88.130148155164189 These observed differences might be due to differences in adventure of bias (all except Luo et al were rated high risk of bias), heterogeneity in report populations (South Korea, Communist china, Turkey, and the United States), issue definitions (progression to severe covid-19 v mortality), and sampling variability (number of events were 36, 55, 131, 201, and unclear).

Table 3

External validations with unclear adventure of bias and large validation samples

Word

In this systematic review of prediction models related to the covid-nineteen pandemic, we identified and critically appraised 232 models described in 169 studies. These prediction models tin be divided into three categories: models for the general population to predict the take a chance of having covid-nineteen or being admitted to hospital for covid-19; models to support the diagnosis of covid-19 in patients with suspected infection; and models to support the prognostication of patients with covid-xix. All models reported moderate to first-class predictive performance, but all were appraised to have high or uncertain risk of bias owing to a combination of poor reporting and poor methodological conduct for participant selection, predictor description, and statistical methods used. Models were adult on information from different countries, but the bulk used information from a single country. Often, the available sample sizes and number of events for the outcomes of interest were express. This trouble is well known when edifice prediction models and increases the risk of overfitting the model.190 A high risk of bias implies that the performance of these models in new samples will probably be worse than that reported by the researchers. Therefore, the estimated C indices, oft close to 1 and indicating nearly perfect discrimination, are probably optimistic. The majority of studies developed new models specifically for covid-xix, only only 46 carried out an external validation, and calibration was rarely assessed. We cannot yet recommend any of the identified prediction models for widespread utilise in clinical do, although a few diagnostic and prognostic models originated from studies that were clearly of ameliorate quality. We suggest that these models should be further validated in other data sets, and ideally past independent investigators.141143

Challenges and opportunities

The main aim of prediction models is to support medical decision making in individual patients. Therefore, it is vital to identify a target setting in which predictions serve a clinical demand (eg, emergency department, intensive care unit, full general exercise, symptom monitoring app in the general population), and a representative dataset from that setting (preferably comprising consecutive patients) on which the prediction model can exist developed and validated. This clinical setting and patient characteristics should be described in detail (including timing inside the disease grade, the severity of disease at the moment of prediction, and the comorbidity), so that readers and clinicians are able to understand if the proposed model could be suited for their population. Unfortunately, the studies included in our systematic review often lacked an adequate description of the target setting and study population, which leaves users of these models in doubt about the models' applicability. Although nosotros recognise that the earlier studies were done under severe time constraints, we recommend that any studies currently in preprint and all time to come studies should adhere to the TRIPOD reporting guideline12 to improve the description of their study population and guide their modelling choices. TRIPOD translations (eg, in Chinese and Japanese) are likewise available at https://www.tripod-statement.org.

A amend description of the study population could besides help us understand the observed variability in the reported outcomes across studies, such as covid-xix related mortality and covid-19 prevalence. The variability in mortality could be related to differences in included patients (eg, age, comorbidities) and interventions for covid-nineteen. The variability in prevalence could in office be reflective of different diagnostic standards across studies.

Covid-19 prediction volition often non present every bit a simple binary classification chore. Complexities in the data should be handled accordingly. For example, a prediction horizon should exist specified for prognostic outcomes (eg, 30 day mortality). If study participants accept neither recovered nor died within that fourth dimension period, their data should not exist excluded from assay, which some reviewed studies have washed. Instead, an appropriate time to result analysis should be considered to allow for administrative censoring.13 Censoring for other reasons, for instance because of quick recovery and loss to follow-up of patients who are no longer at run a risk of death from covid-nineteen, could necessitate assay in a competing risk framework.191

Nosotros reviewed 75 studies that used only medical images to diagnose covid-19, covid-nineteen related pneumonia, or to assist in segmentation of lung images, the majority using avant-garde machine learning methodology. The predictive performance measures showed a high to well-nigh perfect power to identify covid-19, although these models and their evaluations likewise had a high risk of bias, notably because of poor reporting and an artificial mix of patients with and without covid-19. Currently, none of these models is recommended to be used in clinical exercise. An independent systematic review and critical appraisal (using PROBAST12) of machine learning models for covid-xix using breast radiographs and CT scans came to the same conclusions, even though they focused on models that met a minimum requirement of study quality based on specialised quality metrics for the assessment of radiomics and deep-learning based diagnostic models in radiology.192

A prediction model practical in a new healthcare setting or country often produces predictions that are miscalibrated193 and might need to be updated earlier it can safely be applied in that new setting.13 This requires data from patients with covid-xix to exist bachelor from that system. Instead of developing and updating predictions in their local setting, private participant data from multiple countries and healthcare systems might let better understanding of the generalisability and implementation of prediction models across different settings and populations. This arroyo could profoundly improve the applicability and robustness of prediction models in routine care.194195196197198

The evidence base for the development and validation of prediction models related to covid-19 volition continue to increment over the coming months. To leverage the full potential of these evolutions, international and interdisciplinary collaboration in terms of information acquisition, model building and validation is crucial.

Study limitations

With new publications on covid-19 related prediction models apace entering the medical literature, this systematic review cannot be viewed as an up-to-date list of all currently available covid-xix related prediction models. Also, fourscore of the studies nosotros reviewed were only bachelor as preprints. These studies might amend afterward peer review, when they enter the official medical literature; nosotros will reassess these peer reviewed publications in future updates. Nosotros also found other prediction models that are currently being used in clinical do without scientific publications,199 and web risk calculators launched for utilize while the scientific manuscript is however under review (and unavailable on asking).200 These unpublished models naturally autumn outside the scope of this review of the literature. Every bit we have argued extensively elsewhere,201 transparent reporting that enables validation by independent researchers is key for predictive analytics, and clinical guidelines should just recommend publicly available and verifiable algorithms.

Implications for exercise

All reviewed prediction models were found to take an unclear or high take a chance of bias, and show from independent external validations of the newly developed models is still scarce. However, the urgency of diagnostic and prognostic models to assist in quick and efficient triage of patients in the covid-xix pandemic might encourage clinicians and policymakers to prematurely implement prediction models without sufficient documentation and validation. Earlier studies have shown that models were of limited use in the context of a pandemic,202 and they could even cause more harm than adept.203 Therefore, we cannot recommend any model for use in do at this point.

The current oversupply of insufficiently validated models is non useful for clinical practice. Moreover, predictive performance estimates obtained from different populations, settings, and types of validation (internal v external) are non directly comparable. Futurity studies should focus on validating, comparing, improving, and updating promising available prediction models.13 The models past Knight and colleagues143 and Jehi and colleagues141 are practiced candidates for validation studies in other data. We advise Jehi and colleagues to make all model equations available for contained validation.141 Such external validations should appraise not only discrimination, but also calibration and clinical utility (internet benefit),193198203 in large datasets187188 nerveless using an advisable study design. In improver, these models' transportability to other countries or settings remains to exist investigated. Owing to differences between healthcare systems (eg, Chinese and European) and over time in when patients are admitted to and discharged from hospital, too as the testing criteria for patients with suspected covid-19, we anticipate most existing models will be miscalibrated, simply researchers could endeavour to update and adjust the model to the local setting.

Near reviewed models used data from a hospital setting, but few are available for primary care and the general population. Additional research is needed, including validation of any recently proposed models not nonetheless included in the current update of the living review (eg, Clift et al204). The models reviewed to engagement predicted the covid-xix diagnosis or assess the risk of bloodshed or deterioration, whereas long term morbidity and functional outcomes remain understudied and could exist a target effect of interest in future studies developing prediction models.205206

When creating a new prediction model, we recommend building on previous literature and practiced stance to select predictors, rather than selecting predictors in a purely information driven way.13 This is especially important for datasets with limited sample size.207 Often used predictors included in multiple models identified by our review are vital signs, age, comorbidities, and image features, and these should be considered when appropriate. Influenza-similar symptoms should be considered in diagnostic models, and sex, C reactive protein, and lymphocyte counts could exist considered every bit prognostic factors.

By pointing to the well-nigh important methodological challenges and issues in design and reporting of the currently bachelor models, we hope to have provided a useful starting point for further studies, which should preferably validate and update existing ones. This living systematic review has been conducted in collaboration with the Cochrane Prognosis Methods Group. We will update this review and appraisement continuously to provide up-to-date information for healthcare determination makers and professionals as more than international research emerges over fourth dimension.

Conclusion

Several diagnostic and prognostic models for covid-nineteen are currently available and they all written report moderate to excellent discrimination. However, these models are all at loftier or unclear risk of bias, mainly because of model overfitting, inappropriate model evaluation (eg, calibration ignored), use of inappropriate data sources and unclear reporting. Therefore, their performance estimates are probably optimistic and not representative for the target population. The COVID-PRECISE grouping does not recommend any of the current prediction models to exist used in practice, simply 1 diagnostic and 1 prognostic model originated from higher quality studies and should be (independently) validated in other datasets. For details of the reviewed models, see https://www.covprecise.org/. Hereafter studies aimed at developing and validating diagnostic or prognostic models for covid-19 should explicitly describe the concerns raised and follow existing methodological guidance for prediction modeling studies, because unreliable predictions could cause more than harm than benefit in guiding clinical decisions. Prediction model authors should attach to the TRIPOD (transparent reporting of a multivariable prediction model for private prognosis or diagnosis) reporting guideline. Finally, sharing data and expertise for the validation and updating of covid-19 related prediction models is urgently needed.

What is already known on this topic

  • The sharp contempo increment in coronavirus disease 2019 (covid-nineteen) incidence has put a strain on healthcare systems worldwide; an urgent need exists for efficient early detection of covid-19 in the general population, for diagnosis of covid-xix in patients with suspected disease, and for prognosis of covid-19 in patients with confirmed disease

  • Viral nucleic acid testing and chest computed tomography imaging are standard methods for diagnosing covid-19, just are time consuming

  • Earlier reports suggest that elderly patients, patients with comorbidities (chronic obstructive pulmonary affliction, cardiovascular disease, hypertension), and patients presenting with dyspnoea are vulnerable to more severe morbidity and mortality after infection

What this study adds

  • Seven models identified patients at risk in the general population (using proxy outcomes for covid-xix)

  • Thirty three diagnostic models were identified for detecting covid-nineteen, in addition to 75 diagnostic models based on medical images, ten diagnostic models for severity classification, and 107 prognostic models for predicting, among others, mortality risk, progression to severe affliction

  • Proposed models are poorly reported and at high risk of bias, raising business that their predictions could exist unreliable when practical in daily exercise

  • Two prediction models (one for diagnosis and one for prognosis) were identified every bit being of higher quality than others and efforts should be fabricated to validate these in other datasets

Acknowledgments

We thank the authors who made their piece of work available by posting it on public registries or sharing it confidentially. A preprint version of the study is publicly available on medRxiv.

Footnotes

  • Contributors: LW conceived the study. LW and MvS designed the study. LW, MvS, and BVC screened titles and abstracts for inclusion. LW, BVC, GSC, TPAD, MCH, GH, KGMM, RDR, ES, LJMS, EWS, KIES, CW, JAAD, PD, MCH, NK, AL, KL, JM, Clan, JBR, JCS, CS, NS, MS, RS, TT, SMJvK, FSvR, LH, RW, GPM, Information technology, JYV, DLD, JW, FSvR, PH, VMTdJ, MK, ICCvdH, BCTvB, DJM, and MvS extracted and analysed data. MDV helped interpret the findings on deep learning studies and MMJB, LH, and MCH assisted in the interpretation from a clinical viewpoint. RS and FSvR offered technical and administrative support. LW and MvS wrote the first draft, which all authors revised for disquisitional content. All authors approved the concluding manuscript. LW and MvS are the guarantors. The guarantors had full admission to all the data in the written report, take responsibility for the integrity of the information and the accuracy of the data analysis, and had final responsibility for the decision to submit for publication. The corresponding author attests that all listed authors encounter authorship criteria and that no others meeting the criteria accept been omitted.

  • Funding: LW, BVC, LH, and MDV acknowledge specific funding for this piece of work from Internal Funds KU Leuven, KOOR, and the COVID-19 Fund. LW is a postdoctoral young man of Inquiry Foundation-Flanders (FWO) and receives back up from ZonMw (grant 10430012010001). BVC received support from FWO (grant G0B4716N) and Internal Funds KU Leuven (grant C24/15/037). TPAD acknowledges financial support from the netherlands Organisation for Health Research and Development (grant 91617050). VMTdJ was supported by the European Marriage Horizon 2020 Enquiry and Innovation Programme under ReCoDID grant agreement 825746. KGMM and JAAD acknowledge financial support from Cochrane Collaboration (SMF 2018). KIES is funded by the National Institute for Health Inquiry (NIHR) Schoolhouse for Primary Care Research. The views expressed are those of the author(due south) and not necessarily those of the NHS, the NIHR, or the Department of Health and Social Care. GSC was supported by the NIHR Biomedical Enquiry Centre, Oxford, and Cancer Enquiry UK (programme grant C49297/A27294). JM was supported by the Cancer Enquiry United kingdom (programme grant C49297/A27294). PD was supported by the NIHR Biomedical Research Centre, Oxford. MOH is supported by the National Heart, Lung, and Blood Institute of the United States National Institutes of Health (grant R00 HL141678). ICCvDH and BCTvB received funding from Euregio Meuse-Rhine (grant Covid Data Platform (coDaP) interref EMR-187). The funders played no role in study blueprint, data collection, data analysis, information interpretation, or reporting.

  • Competing interests: All authors have completed the ICMJE uniform disclosure class at www.icmje.org/coi_disclosure.pdf and declare: back up from Internal Funds KU Leuven, KOOR, and the COVID-19 Fund for the submitted work; no competing interests with regards to the submitted piece of work; LW discloses support from Research Foundation-Flanders; RDR reports personal fees as a statistics editor for The BMJ (since 2009), consultancy fees for Roche for giving meta-analysis teaching and advice in Oct 2018, and personal fees for delivering in-house preparation courses at Barts and the London School of Medicine and Dentistry, and the Universities of Aberdeen, Exeter, and Leeds, all exterior the submitted piece of work; MS coauthored the editorial on the original commodity.

  • Ethical approving: Not required.

  • Information sharing: The study protocol is available online at https://osf.io/ehc47/. Detailed extracted data on all included studies are bachelor on https://www.covprecise.org/.

  • The lead authors affirm that the manuscript is an honest, authentic, and transparent business relationship of the study being reported; that no important aspects of the study have been omitted; and that whatever discrepancies from the report as planned have been explained.

  • Dissemination to participants and related patient and public communities: The written report protocol is available online at https://osf.io/ehc47/.

  • Provenance and peer review: Not deputed; externally peer reviewed.

References

View Abstract

gosstittheir.blogspot.com

Source: https://www.bmj.com/content/369/bmj.m1328

Post a Comment for "Review of Software Testing Techniques by Lu Luo"