Marsbahis

Bedava bonus veren siteler

Marsbahis

Hacklink

antalya dedektör

Marsbahis marsbet

Hacklink

Hacklink

Atomic Wallet

Marsbahis

Marsbahis

Marsbahis

Hacklink

casino kurulum

Hacklink

Hacklink

printable calendar

Hacklink

Hacklink

jojobet giriş

Hacklink

Eros Maç Tv

hacklink panel

hacklink

Hacklink

Hacklink

fatih escort

Hacklink

Hacklink

Hacklink

Marsbahis

Rank Math Pro Nulled

WP Rocket Nulled

Yoast Seo Premium Nulled

kiralık hacker

Hacklink

Hacklink

Hacklink

Hacklink

Hacklink

Marsbahis

Hacklink

Hacklink Panel

Hacklink

Holiganbet

Marsbahis

Marsbahis

Marsbahis güncel adres

Marsbahis giris

Hacklink

Hacklink

Nulled WordPress Plugins and Themes

sekabet giriş

olaycasino giriş

Hacklink

hacklink

imajbet giriş

Taksimbet

Marsbahis

Hacklink

Marsbahis

Marsbahis

Hacklink

Marsbahis

Hacklink

Bahsine

Betokeys

Tipobet

Hacklink

Betmarlo

matbet giriş

Marsbahis

บาคาร่า

holiganbet

Hacklink

Hacklink

Hacklink

Hacklink

duplicator pro nulled

elementor pro nulled

litespeed cache nulled

rank math pro nulled

wp all import pro nulled

wp rocket nulled

wpml multilingual nulled

yoast seo premium nulled

Nulled WordPress Themes Plugins

Marsbahis casino

Buy Hacklink

Hacklink

Hacklink

Hacklink

Hacklink

Hacklink

Hacklink

Bahiscasino

Hacklink

Hacklink

Hacklink

Hacklink

หวยออนไลน์

Hacklink

Marsbahis

Hacklink

Hacklink

Marsbahis

Hacklink

Hacklink satın al

Hacklink

Marsbahis giriş

Marsbahis

Marsbahis

casibom

meritking

maltcasino

sekabet

Nettoyage Professionnel Savoie

imajbet giriş


Large language models (LLMs) have shown potential for medical and health question answering across various health-related tests spanning different formats and sources, such as multiple choice and short answer exam questions (e.g., USMLE MedQA), summarization, and clinical note taking, among others. Especially in low-resource settings, LLMs can potentially serve as valuable decision-support tools, enhancing clinical diagnostic accuracy and accessibility, and providing multilingual clinical decision support and health training, all of which are especially valuable at the community level.

Despite their success on existing medical benchmarks, there is uncertainty about whether these models generalize to tasks involving distribution shifts in disease types, contextual differences across symptoms, or variations in language and linguistics, even within English. Further, localized cultural contexts and region-specific medical knowledge is important for models deployed outside of traditional Western settings. Yet without diverse benchmark datasets that reflect the breadth of real-world contexts, it’s impossible to train or evaluate models in these settings, highlighting the need for more diverse benchmark datasets.

To address this gap, we present AfriMed-QA, a benchmark question–answer dataset that brings together consumer-style questions and medical school–type exams from 60 medical schools, across 16 countries in Africa. We developed the dataset in collaboration with numerous partners, including Intron health, Sisonkebiotik, University of Cape Coast, the Federation of African Medical Students Association, and BioRAMP, which collectively form the AfriMed-QA consortium, and with support from PATH/The Gates Foundation. We evaluated LLM responses on these datasets, comparing them to answers provided by human experts and rating their responses according to human preference. The methods used in this project can be scaled to other locales where digitized benchmarks may not currently be available.

Share.
Leave A Reply

Exit mobile version