For two decades, scientists have been comparing every person’s full set of DNA they study to a template that relies mostly on genetic material from one man affectionately known as “the guy from Buffalo.”
But they've long known that this template for comparison, or “reference genome,” has serious limits because it doesn't reflect the spectrum of human diversity.
“We need a really good understanding of the variations, the differences between human beings,” said genomics expert Benedict Paten of the University of California, Santa Cruz. “We’re missing out.”
Now, scientists are building a much more diverse reference that they call a “pangenome," which so far includes the genetic material of 47 people from various places around the world. It’s the subject of four studies published Wednesday in the journals Nature and Nature Biotechnology. Scientists say it's already teaching them new things about health and disease and should help patients down the road.
Paten said the new reference should help scientists understand more about what’s normal and what’s not. “It is only by understanding what common variation looks like that we’ll be able to say, ‘Oh, this big structural variation that affects this gene? Don’t worry about it,’” he said.
A human genome is the set of instructions to build and sustain a human being, and experts define a pangenome as a collection of whole genome sequences from many people that is designed to represent the genetic diversity of the human species. The pangenome is not a composite but a collection; scientists depict it as a rainbow of stacked genomes, compared with one line representing the older, single reference genome.
The Human Pangenome Project builds upon the first sequencing of a complete human genome, which was nearly completed more than two decades ago and finally finished last year. Paten, a pangenome study author and project leader, said 70% of that first reference genome came from an African American man with mixed African and European ancestry who answered an ad for volunteers in a Buffalo newspaper in 1997. About 30% came from a mix of around 20 people.
The pangenome contains material from 24 people of African ancestry, 16 from the Americas and the Caribbean, six from Asia and one from Europe.
Although any two people’s genomes are more than 99% identical, Paten said “it’s those differences that are the things that genetics and genomics is concerned with studying and understanding.”
It may take a while for patients to see concrete benefits from the research. But scientists said new insights should eventually make genetic testing more accurate, improve drug discovery and bolster personalized medicine, which uses someone’s unique genetic profile to guide decisions for preventing, diagnosing and treating disease.
“The Pangenome Project gives a more accurate representation of the genome of people from around the world," and should help doctors better diagnose genetic conditions, said clinical genetics expert Dr. Wendy Chung at Columbia University, who was not involved in the research.
If someone has a variation in a certain gene, it could be compared to the rainbow of references.
Study author Evan Eichler of the University of Washington said researchers will also learn more about genes already linked to problems, such as one tied to cardiovascular disease in African Americans.
“Now that we can actually sequence that gene in its entirety and we can understand the variation in that gene, we can start to go back to unexplained cases of patients with coronary heart disease" and look at them in light of the new knowledge, he said. Eichler is paid by the Howard Hughes Medical Institute, which also supports The Associated Press’s health and science department.
University of Minnesota plant genetics expert Candice Hirsch, who wasn't involved in the research but has closely followed the effort, said she expects many discoveries to flow from it. Until now, "we really have only been able to scratch the surface of understanding the genetics that underlies disease,” she said.
The consortium leading the research is part of the Human Genome Reference Program, which is funded by an arm of the U.S. National Institutes of Health.
The team is in the process of adding to the collection of reference genomes, with the goal of having sequences from 350 people by the middle of next year. Scientists are also hoping to work more with international partners, including those focusing on Indigenous populations.
“We're in it for the long game,” Paten said.