Scroll Top

Bringing Genetic Diversity to the Forefront: Moez Dawood on the AllofUs 250k Annotator for OpenCRAVAT

AllofUs 250K Annotator

Moez Dawood, an MD/PhD student at Baylor College of Medicine, has recently developed the AllofUs 250k annotator for OpenCRAVAT, a genomic annotation platform designed to handle both single variant lookups and large-scale data processing. His work brings one of the most diverse genomic resources in the world directly into variant annotation pipelines, making it easier for researchers and clinical laboratories to incorporate ancestry-specific allele frequency data into their analyses.

Moez just completed his PhD in the Genetics and Genomics program, working with Dr. Richard Gibbs and Dr. James Lupski. His research spans the full spectrum of rare disease genomics, from patient recruitment and consent to sequencing, functional genomics, and variant interpretation. By working across the entire pipeline, he has developed a strong sense of how genomic tools need to be built to work at scale and to work equitably. His motivation for this annotator was straightforward: the All of Us Research Program has intentionally recruited participants from diverse genetic ancestries, and the data produced is uniquely positioned to improve rare variant interpretation.

The All of Us dataset currently includes genomic data from 250,000 participants, with allele counts, numbers, and frequencies broken down by calculated genetic ancestry groups such as African, East Asian, South Asian, Admixed American, Middle Eastern, and European. While other databases remain invaluable, they have historically underrepresented certain populations. Moez saw an opportunity to fill that gap by making the All of Us Variant Annotation Table directly available through OpenCRAVAT. Instead of being limited to searching a website or downloading unwieldy data files, researchers can now integrate this ancestry-aware frequency information directly into their own annotation workflows.

Developing the annotator required both persistence and technical problem-solving. The sequencing for All of Us is performed by three genome centers, including Baylor, and the raw data is harmonized by the All of Us Data Coordinating Center. After obtaining approval for the project, Moez downloaded the variant annotation tables, reformatted them, converted them into a SQL database, and performed quality control before building the OpenCRAVAT module.

Once the data was prepared, integrating it into OpenCRAVAT was straightforward thanks to the platform’s modular architecture and Python backend. The annotator installs like any other module and requires no external dependencies. Moez has since deployed it both locally and in cloud environments, enabling large-scale use cases. The data footprint is significant, around 140 GB, but the indexing and structure allow it to perform efficiently even on complex datasets.

The annotator is already in active use for rare disease research. In Moez’s work, it adds confidence during variant filtering by providing more accurate frequency information for underrepresented populations, helping to rule out common variants that may not be flagged in other databases. The benefit extends beyond rare disease, as the dataset can be applied to population genetics studies, adult-onset disease research, and the discovery of novel gene–disease associations. Moez points out that adult genetics is still relatively underexplored compared to pediatric genetics, and biobanks like All of Us and the UK Biobank are now providing unprecedented opportunities to study genetic architecture in large adult cohorts.

Early validation has shown the utility of the resource. Comparing allele frequencies in ClinVar variants between All of Us and gnomAD revealed expected agreement for common variants, while highlighting key differences in rare and ultra-rare sites. In practical terms, this means there are thousands of variants per genome that lack frequency annotations in gnomAD but are annotated in All of Us, giving researchers actionable information they did not previously have. Moez and his colleagues have presented these findings at the 2025 American College of Medical Genetics and Genomics annual meeting and are preparing a manuscript to describe the implementation and use cases in more detail.

The AllofUs 250k annotator is part of a broader effort by Moez and his collaborators to build a flexible, scalable annotation platform using OpenCRAVAT. Alongside this project, they have released a companion annotator for the Regeneron Million Exome Project and have integrated large language model datasets for additional analytical depth. While Moez will soon return to medical school to complete his final year, he plans to remain active in developing and maintaining genomic resources for the research community.

Researchers and clinical labs can download the AllofUs 250k annotator from the OpenCRAVAT module store and begin incorporating ancestry-specific allele frequency data into their own workflows. Those who use the tool are encouraged to share their findings with the All of Us community and the OpenCRAVAT team. Moez is also happy to connect directly and can be reached at mdawood@bcm.edu.

The All of Us Research Program and its expanding genomic dataset represent a significant step toward inclusive precision medicine. By bringing this data into widely used annotation platforms, tools like the AllofUs 250k annotator make it possible for the benefits of diversity in genomic research to translate directly into better science, more accurate variant interpretation, and ultimately improved patient care.