SCIEPublish

Harnessing Artificial Intelligence for Hypothesis Generation in Childhood Asthma: Insights from NHANES

Article Open Access

Harnessing Artificial Intelligence for Hypothesis Generation in Childhood Asthma: Insights from NHANES

Author Information
1
Division of Pediatric Pulmonary Medicine, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA 15224, USA
2
Department of Biostatistics and Health Data Science, School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.

Received: 02 December 2025 Revised: 24 December 2025 Accepted: 15 April 2026 Published: 28 April 2026

Creative Commons

© 2026 The authors. This is an open access article under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).

Views:7
Downloads:3
J. Respir. Biol. Transl. Med. 2026, 3(2), 10003; DOI: 10.70322/jrbtm.2026.10003
ABSTRACT: Although large language models (LLMs) have undergone substantial development, their applicability to epidemiological research has not been sufficiently examined. This study aims to develop and evaluate an LLM-based framework for hypothesis generation and testing, demonstrating its application in childhood asthma in the National Health and Nutrition Examination Survey (NHANES). Pilot study was conducted to explore factors associated with childhood asthma in the 2001–2020 NHANES cycles. A modular agent system was developed, including Database Query, Statistic, Paper Search, and Paper Download tools, along with two LLM models (Key Generator and Hypothesis Tester). Multivariable logistic regression was used to test for the association between each variable and current asthma, generating a tentative affirmative claim. The Key Generator module produced keywords for literature search, the Paper Search and Paper Download tools queried PubMed and retrieved relevant studies, and the Hypothesis Tester module synthesized evidence and determined the support for claims for each variable. Keywords and conclusions were reviewed by researchers and validated using multiple LLMs (ChatGPT, DeepSeek, and Gemini) to ensure consistency and robustness. 25,839 children with (n = 2928) and without (n = 22,911) current asthma, and 10,359 variables were included in the multivariable analysis, which yielded 100 variables associated with asthma. Of these, 21 were directly related to asthma (supporting published studies), 43 were indirectly related to asthma (based on background knowledge, though not explicitly discussed in the available publications), and 34 were unrelated to asthma. Two variables were excluded due to a lack of discriminative keywords. This study demonstrates the effectiveness of LLM-based models for generating and testing hypotheses about childhood asthma.
Keywords: Artificial intelligence; Asthma; Children; Risk factors
TOP