Given the vast wealth of literature regarding health disparities and minority health and limitations placed on scale and scope when performing traditional manual review, computational methods are useful for conducting large-scale assessments. The Health Disparities and Minority Health (HDMH) Monitor aims to leverage natural language processing and machine learning methods to perform a comprehensive scoping review and characterize major topics found in the HDMH literature, examine change in topic mention over time, identify any notable gaps in coverage, and derive actionable insights for further inquiry.
Guided by the Preferred Reporting Items for Systematic review and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) framework, we adapted computational approaches to several components including search, data charting (i.e., extraction), and synthesis of results.
For our data source, we used the National Library of Medicine (NLM) MEDLINE database, a bibliographic repository that contains more than 28 million biomedical references. MEDLINE and its public search interface, PubMed, are the primary tools for performing biomedical literature searches among health professionals and students.
Using this data source also enabled us to employ the MEDLINE/PubMed Health Disparities and Minority Health Search Strategy established by the NLM. This strategy incorporates 257 subject terms and free-text strings found in titles and abstracts. We included all articles published in English regardless of study design or article type. To minimize selection bias, our search only included articles with publication dates between 1975 and 2020, inclusive. We also limited our search to articles with abstracts containing a minimum of fifty words to ensure sufficient text for information extraction and characterization.
This project was supported in part by a National Library of Medicine (NLM) Biomedical Informatics and Data Science Research Training Grant (T15LM007079) and a Computational and Data Science Fellowship from the Association for Computing Machinery Special Interest Group in High Performance Computing (ACM SIGHPC). Funding agencies had no role in the design and conduct of this work; collection, management, analysis, and interpretation of the data; preparation review, or approval; or decision to create the dashboard or submit any associated manuscripts for publication.
Please cite the following work in papers or derivative software:
Reyes Nieva H, Bakken S, Elhadad N. Mining the Health Disparities and Minority Health Bibliome: A Computational Scoping Review and Gap Analysis of 200,000+ Articles. 2023. medRxiv. doi: 10.1101/2023.10.17.23296754.