In the digital age, healthcare is rapidly embracing technology, with digital symptom assessment tools leading this transformation. Platforms like WebMD, Symptomate, Ada, and Klinik Health, supported by around 300 GPs, are pivotal in shifting towards AI-powered digital triaging, enhancing patient management efficiency globally. However, these advancements face significant hurdles, primarily linguistic limitations. Most AI triaging systems cater mainly to English or European languages, neglecting the linguistic nuances of many non-European languages, which can disadvantage native speakers preferring their own languages for medical discussions. Additionally, the accuracy of these systems often relies on patient-reported symptoms via text-based interfaces, challenging for non-Latin script users and further compromised by data predominantly from white European patients, thus potentially perpetuating health inequalities.
In many developing countries, such as India and Bangladesh, where private healthcare systems provide the majority of services, patients often self-diagnose and self-refer to specialized clinics. Due to inadequate patient education, they frequently end up at the wrong clinic, which delays diagnosis. This often results in significant out-of-pocket healthcare expenses. Each year, 6 million people in Bangladesh fall into extreme poverty due to emergency healthcare costs. We have developed a clinical decision support system that has been trained on the aggregated analytical data of 1.4 million patients to accurately guide patients to the appropriate specialist.
Fig: Patient Vignette Sample Case
To evaluate our recommendation system, we have created a rigorous validation framework using a proprietary dataset comprising 185 patient vignette cases (Gold Standard Data). These cases span a broad range of age groups and genders and include various medical conditions like asthma, heart disease, and hypertension, along with family medical histories. Each vignette, detailed with primary complaints and additional symptoms, addresses more than 100 primary diseases. The vignettes were crafted by two general physicians, each with an average of 8 years of experience. Subsequently, a panel of five general physicians reviewed each case to assess the accuracy of the specialist recommendations. We then cross-validated the recommendations provided by our system against those of the gold standard and the independent panel of five physicians.
Fig: Gender Distribution
In validating our recommendation system, it is essential to use a dataset that is balanced across various demographics, including gender. Our vignette dataset achieves near parity in gender representation, with males comprising 51% and females making up 49% of the cases. This careful consideration ensures that the insights and recommendations generated by our system are not biased towards one gender. By maintaining a balanced gender distribution, we can more accurately reflect the diverse patient profiles that healthcare providers encounter, enhancing the system's applicability and reliability in real-world settings.
Fig: Age Distribution
Age diversity is another critical factor in the validation of our recommendation system. Our dataset categorizes patient vignettes into three distinct age groups to cover a broad spectrum of the population. Group 1 includes young adults aged 18-40 years, representing 48.6% of the dataset, which reflects common healthcare users who are active and often seek preventive care or management of acute conditions. Group 2, comprising middle-aged adults from 41-60 years, makes up 43.8% of the cases, addressing the increased healthcare needs and chronic conditions typical in this demographic. Finally, Group 3 includes older adults aged 61-80 years, accounting for 7.6% of the dataset, focusing on the complex healthcare scenarios often seen in this age group. This structured age stratification allows the system to tailor its recommendations more effectively across different life stages, enhancing the accuracy and relevance of its clinical decisions.
Fig: Accuracy Overview
Performance:
Our recommendation system has demonstrated robust performance metrics across different areas, highlighting
its effectiveness in aiding healthcare decisions. Specifically, we achieved an accuracy rate of 88.64% in
Specialization Recommendation. This high level of precision indicates that our system is extremely
effective at identifying and recommending the appropriate medical specialists for individual patient cases
based on the symptoms and medical history presented. Such a high success rate ensures that patients are
more likely to be directed to the right healthcare professional, which can significantly improve the
efficiency of medical consultations and the overall patient journey. In the area of Disease
Recommendation, our system achieved an accuracy of 78.91%.
While slightly lower than the specialization accuracy, this still represents a significant achievement in accurately diagnosing diseases based on the input data. This capability is crucial, as it assists in the early and accurate detection of potential health issues, allowing for timely intervention. It underscores the system's utility in parsing complex medical data to output diagnoses that align closely with the assessments a trained physician might make, thus supporting healthcare providers in delivering high-quality care.
Some diseases, such as Asthma and Chronic Obstructive Pulmonary Disease, Respiratory Infection and Bronchitis etc pose challenges for accurate diagnosis even for medical specialists without additional diagnostic tests.