Issue 2, Volume 3 – 2 articles

Open Access

Article

28 April 2026

Harnessing Artificial Intelligence for Hypothesis Generation in Childhood Asthma: Insights from NHANES

Although large language models (LLMs) have undergone substantial development, their applicability to epidemiological research has not been sufficiently examined. This study aims to develop and evaluate an LLM-based framework for hypothesis generation and testing, demonstrating its application in childhood asthma in the National Health and Nutrition Examination Survey (NHANES). Pilot study was conducted to explore factors associated with childhood asthma in the 2001–2020 NHANES cycles. A modular agent system was developed, including Database Query, Statistic, Paper Search, and Paper Download tools, along with two LLM models (Key Generator and Hypothesis Tester). Multivariable logistic regression was used to test for the association between each variable and current asthma, generating a tentative affirmative claim. The Key Generator module produced keywords for literature search, the Paper Search and Paper Download tools queried PubMed and retrieved relevant studies, and the Hypothesis Tester module synthesized evidence and determined the support for claims for each variable. Keywords and conclusions were reviewed by researchers and validated using multiple LLMs (ChatGPT, DeepSeek, and Gemini) to ensure consistency and robustness. 25,839 children with (n = 2928) and without (n = 22,911) current asthma, and 10,359 variables were included in the multivariable analysis, which yielded 100 variables associated with asthma. Of these, 21 were directly related to asthma (supporting published studies), 43 were indirectly related to asthma (based on background knowledge, though not explicitly discussed in the available publications), and 34 were unrelated to asthma. Two variables were excluded due to a lack of discriminative keywords. This study demonstrates the effectiveness of LLM-based models for generating and testing hypotheses about childhood asthma.

Open Access

Article

28 May 2026

Striking Surge in Lung Cancer Incidence in Children of Early Life

Lung cancer ranks first in mortality and the third in total cancer cases diagnosed in the US. The epidemiological trends may vary among different age groups, while the dynamics of risk factors evolve as well. We aim to carefully characterize trends of lung cancer among different age groups in the past two decades, by accessing the Surveillance, Epidemiology, and End Results (SEER) datasets from the National Cancer Institute (NCI), and to delineate possible root causes. The SEER datasets were obtained from NCI. Data on environmental risk factors were acquired from the Environmental Protection Agency and the United States Geological Survey. The tobacco consumption data were sourced from the Centers for Disease Control and Prevention. Trends were examined statistically with the Mann-Kendall algorithm. The incidence rate of lung cancer in the <15 age group has been rising in the past two decades, most strikingly among infants in the 0 age group (at birth to less than 1 year old). These findings were unique for lung cancer. The usage of e-Cigarettes among pregnant women increased, while the potential influence of other known risk factors was on the decline. A shrinkage of the infant population and a higher rate of pregnancy loss were observed during the same timespan. A striking rise in lung cancer incidence among infants has been identified that is opposite to the declining trend in the overall population, which might be related to increased e-Cigarette use in pregnant women. Urgent further investigation is warranted to safeguard the newborn population from being continuously affected potentially by lung cancer.

TOP