Dr. Iltifat Husain’s physician take is at the end of this article
In a recent research letter that was published in the Journal of American Medical Association titled, “Comparison of Physician and Computer Diagnostic Accuracy” by Semigran et al. researchers studied the diagnostic accuracy of apps and websites against the diagnostic accuracy of physicians.
This study was an extension of the researchers’ prior study where they scrutinized symptom checking apps and websites for diagnostic accuracy. In this study they compared those same symptoms checking digital tools with clinician accuracy.
The authors compared 23 apps and websites, which they call “symptom checkers” with 234 physicians to determine what group had the higher rate of successful diagnoses. They used 45 clinical vignettes (low, medium and high-acuity; common and uncommon) for both physicians and symptom checkers in a platform called Human Dx where “physicians generate differential diagnoses for clinical vignettes.”
Although there were no physical examinations conducted in the study and the co-authors admit that real-world scenarios can be dramatically different, the human physicians “vastly outperformed” their digital counterparts at a margin of 84.3% to 51.2%. Physicians were most accurate when it came to diagnosing high-acuity and uncommon vignettes. Symptom checkers were most accurate when diagnosis low-acuity and common vignettes.
Dr. Iltifat Husain’s take:
While this study is definitely interesting it’s not comparing apples to apples at all and I had three concerns:
- The 23 symptoms checking tools that the study utilized vary significantly, yet the study puts them all together as one. The symptom checking tools vary in regards to the number of questions they ask, the algorithms they use, and much more.
- The study uses the HumanDx project for the clinical vignettes. Based on the HumanDx website — the clinical scenarios and symptoms are created mainly by physicians. This is an important distinction. I would think a symptom checking tool that utilizes algorithms made for patient entry is not going to perform as well when utilized in a physician created symptom.
- The purpose of the symptoms checking tools compared varies. Most are only for helping with quick patient triage, not diagnosing. The study doesn’t address if robust computer algorithms made for diagnosing patients can diagnose better than physicians — it doesn’t address artificial intelligence. I bring that up because many of the news articles that are quoting this study are making it seem like this is similar to when computers play against humans in chess — but it’s not. In this case, the symptoms checkers aren’t playing the same “game” as the physicians.