The article discusses a study by the Icahn School of Medicine at Mount Sinai, where researchers tested various large language models (LLMs) including ChatGPT 3.5 and 4, Gemini Pro, LLaMA v2, and Mixtral-8x7B, to assess their ability to practice evidence-based medicine. The study involved giving these models prompts to suggest treatment protocols for test cases, incorporating results and suggesting subsequent actions. ChatGPT 4 was the most accurate, achieving 74% accuracy, significantly outperforming other models.
top of page
bottom of page