In a recent study published in JAMA Dermatology, a group of researchers tested four smartphone applications designed to provide a melanoma risk assessment of skin lesions – with concerning results. They found that the sensitivity of the apps tested ranged from as low as 6.8% to 98.1% and specificity ranged from 20.4% to 93.7%. These results raise a number of concerns about the proliferation of these largely untested apps. At the same time, this study also highlights opportunities that could help improve this technology and potentially improve patient care.
We’ve discussed smartphone apps designed for detecting melanoma using both native apps and peripherals before, including previous studies aimed at assessing this technology. We have also come across apps intended for use in conditions such as diabetes, tinnitus, cardiac disease, and acne.
While much of this technology remains largely untested despite being out in the market, that does not mean there is not potential. This study is hopefully the first of many that apply the same techniques we use to assess other diagnostic tools or therapies to prove efficacy.
In this study, Wolf and colleagues from the University of Pittsburg use a database of pre-op skin lesion images, each of which has an associated pathologic diagnosis. Narrowing to high-quality images without identifying features and a clear diagnosis (e.g. “melanoma” rather than “can not rule out melanoma”), the researchers collected 60 images of melanoma lesions and 120 images of benign lesions. Worth noting, the benign lesions represent a number of lesions that one may not typically consider in the differential a lesion that looks like melanoma, such as a hemangioma.
One important critique of this approach, though, is that the “population” they studied is enriched for melanoma – the real-world correlate would be that 1/3 of patients presenting to a dermatologist for a suspicious lesion in fact have melanoma. It’s important to remember that prevalence is a big part of positive and negative predictive values – and the lower real-world prevalence is likely to significantly alter the reported values for these apps.
They then selected smartphone applications which analyzed images that are downloaded to the app rather than those that utilize images captured in-app. Of the four apps identified, three (Apps 1-3) use a software algorithm to provide a risk assessment and one (App 4) sends the image to a dermatologist for a risk assessment. Of note, these apps were not identified – a move I suspect was for legal reasons.
The results were even more variable than expected. Apps 1 and 2 seem to represent a high sensitivity/low specificity approach – meaning they catch more of the true disease but also often falsely identify benign lesions as melanoma. This approach corresponds as expected to a lower positive predictive value and higher negative predictive value; in other words, if the app says it’s not melanoma then it probably isn’t (around 72%) but if it says it is melanoma there is still only a 30% chance that it is actually melanoma. App 3 takes the opposite approach in terms of sensitivity and specificity. However, things don’t seem to pan out in terms of PPV/NPV – despite a high specificity, the positive predictive value is still low.
App 4 is more akin to a tele-dermatology approach, something which has been more widely tested. The sensitivity of this approach proved to be quite high with a lower specificity. With a remote dermatologist assessing these lesions, its not all that surprising to see these results.
Perhaps the most important takeaway of this study is that it serves as an excellent example of the kind of research we need in mHealth. Granted it is not a true representation of the real-world use of these apps, but it’s a step in the right direction akin to the pre-clinical studies we rely on to at least demonstrate potential efficacy.
As for the study itself, it adds even more weight to a concern that we’ve raised many times before – that untested, unvalidated apps proliferating in the market that claim diagnostic or therapeutic value are potentially dangerous. For melanoma, early diagnosis is critical. Smartphone apps like these, particularly those with a low sensitivity, meaning they often miss true disease, could provide patients with a false security that ultimately delays diagnosis.
However, unlike many others who have reported these results, I don’t see this as a singular rejection of the idea of using smartphones in this setting. It’s worth noting that these apps do reach sensitivities and specificities that are better than many lab tests we send. Furthermore, this study is very specific in the type of app it evaluates – we’ve previously discussed apps that use augmented reality and dermascope peripherals that may perform better. Thorough evaluation and testing of apps like these, coupled with iterative refinement, could very well result in a far better result.
One point worth noting though is that whether its software or a remote physician, these apps can not replace a comprehensive in-person evaluation. In a 2011 study looking at a tele-dermatology system in which a suspicious lesion triggered a trip to the dermatologist, one important finding was that 40% of malignant lesions then identified were not the index lesion that prompted the visit. This study highlights further the importance of rigorous testing not only for design and validation, but also to identify unintended consequences.
In our experience, smartphone apps now on the market that claim to aid in diagnosis or therapy of medical conditions are almost wholly untested, relying instead on loosely related research to assert some validity. But while the current cadre of apps may be lacking, that is not likely to remain the case.
Ultimately, a smartphone based tool – whether an app alone or with a peripheral – that has a high sensitivity and reasonable specificity may find use as a good screening tool. Considering the algorithm approach we use to assess melanoma risk in most primary care settings – the ABCDE approach – its not hard to imagine that such an app should be feasible to design. Technology like this has the potential to help speed diagnosis, improve access to care, and more – if we apply the same rigor and discipline to the design and subsequent evaluation of these tools as we do to every other medical intervention.
Reference: Wolf JA, Moreau JF, Akilov O, Patton T, English JC 3rd, Ho J, Ferris LK. Diagnostic Inaccuracy of Smartphone Applications for Melanoma Detection. JAMA Dermatology. January 16, 2013; E1-E4.