Pyramid Framework: Leveraging Large Language Model Randomness for Enhanced Complex Diagnosis

Category Primary study
Pre-printResearchSquare
Year 2025
The uncertainty in large language model (LLM) responses to clinical diagnostic questions presents both a challenge and opportunity. We utilize the randomness and diversity of LLM responses to develop the Pyramid Framework to enhance performance in complex diagnosis. Using GPT-4o, Gemini-1.5-Pro, and Claude 3 Opus as sampling models, we evaluated this framework with Claude 3.5 Sonnet as a backbone LLM on 170 challenging cases from NEJM and 67 offline challenging cases. Claude 3.5 Sonnet Pyramid Framework achieved 46.1% accuracy and 79.0% coverage on the NEJM dataset, significantly outperforming Chain-of-Thought approaches (35.7% and 67.5% respectively). Similar improvements were observed on the offline dataset. When using o1-mini and o3-mini as the backbone LLM, the framework delivered accuracy improvements of 5.5–24.9% and coverage improvements of 11.9–28.9% across datasets. The framework significantly enhances LLMs' diagnostic performance in complex cases without additional expert-designed prompts, though further validation through prospective diagnostic trials is warranted.
Epistemonikos ID: ce13234a64ff4fd8a1ca058dfb9edaa473da9a9c
First added on: Mar 11, 2025