Machine learning model detects COVID-19 after 3 days of self-reported symptoms


Using self-reported symptoms, a machine learning model was able to predict the early stages of COVID-19 infection after just three days.

Rapid detection of COVID-19 infections by PCR testing is vital to containing the spread of the virus. However, while PCR testing has become the most widely used analytical technique for detecting the virus, the result is highly dependent on when the sample was taken, the type of sample, and the quality of the sample. Another way to identify infected people is to combine symptoms and then ensure that only those with appropriate symptoms are tested. This approach was used in an Italian study of nearly 3000 subjects and using a short diagnostic scale was able to correctly identify symptoms associated with the infection. This same methodology is used in the COVID-19 Symptom Study App, which is a self-reported longitudinal study of the symptom profile of patients with COVID-19. Through the use of machine learning models, the study was able to develop models to identify the main symptoms of infection and their correlation with the results. However, current models are not conducive to early detection of infection. This prompted the COVID-19 Symptom Study team to create a machine learning model that captured self-reported symptoms during just the first three days and used that information to predict the likelihood that an individual is positive for COVID-19.

The team used three different machine learning models to analyze self-reported symptoms. The first model was based on the NHS algorithm which uses the presence of cough, fever or loss of smell between days 1 and 3 as potentially representative of COVID-19 infection. The second logistic regression model is based on an algorithm that incorporates loss of smell, persistent cough, fatigue and skipped meals and which has been previously validated and found to correlate well with COVID-19 infection . For the third algorithm, the team used 18 self-reported symptoms combined with comorbidities as well as demographics and called it a hierarchical Gaussian process model. The three models were compared in terms of sensitivities, specificities, and area under the receptor operating characteristics (AUC) curve and evaluated with a training set, for patients self-reporting symptoms between April and October 2020 and a set of tests for self-reported symptoms between October and November 2020.

There were data from 182,991 participants in the training set and 15,049 in the test set with a similar symptom distribution. The predictive power of the three models was different. For example, the hierarchical Gaussian process model showed the highest predictive value (AUC = 0.80, 95% CI 0.80-0.81) using three days of symptoms compared to the logistic regression model ( AUC = 0.74) and the NHS model (AUC = 0.67). The hierarchical Gaussian process model for predicting COVID-19 infection had a sensitivity of 73% and specificity of 72%. This was higher than the logistic regression model (59%, 76%, sensitivity, specificity, respectively) and the NHS model (60%, 75%, sensitivity, specificity, respectively). Interestingly, the main predictive symptoms of the onset of COVID-19 were loss of smell, chest pain, persistent cough, abdominal pain, blistering feet, eye pain, and unusual pain.

The authors concluded that the hierarchical Gaussian process model was able to successfully predict early signs of infection and could be used to allow referral for testing and self-isolation when these symptoms were present.

Canas LS et al. Early detection of COVID-19 in the UK using self-reported symptoms: a large-scale prospective epidemiological surveillance study. Lancet Digit Health 2021


Comments are closed.