An informatics problem: confusion over the meaning of the term “gene”

There is an issue I rarely if ever see addressed in the healthcare informatics world, but one that looms much larger in bioinformatics and pharmacogenomics: what is a gene, anyway? My students have been struggling with me forcing them to memorize  two definitions: the gene as a unit of heredity, recessive or dominant, but also gene as a DNA sequence that makes protein (as in this Wikipedia graphic):

A couple of years ago the New York Times ran an excellent piece about changing views of the gene. They went so far as to characterize the gene as having an “identity crisis”:

new large-scale studies of DNA are causing her and many of her colleagues to rethink the very nature of genes. They no longer conceive of a typical gene as a single chunk of DNA encoding a single protein. “It cannot work that way,” Dr. Prohaska said. There are simply too many exceptions to the conventional rules for genes.

It turns out, for example, that several different proteins may be produced from a single stretch of DNA. Most of the molecules produced from DNA may not even be proteins, but another chemical known as RNA. The familiar double helix of DNA no longer has a monopoly on heredity. Other molecules clinging to DNA can produce striking differences between two organisms with the same genes. And those molecules can be inherited along with DNA.

The gene, in other words, is in an identity crisis.

There was a reference to a fascinating paper: “Genomics Counfounds Gene Classification” by Gerstein and Seringhaus (2008). The upshot is that the classical view of the gene/DNA relationship, where sections of DNA that is transcribed by RNA and translated into a protein, doesn’t account for data about noncoding DNA,noncoding RNA, and alternative splicing:

This iterative one-gene, one-protein, one-function relationship paints a relatively straightforward picture of subcellular life. When describing the function of a given gene in a cell, biologists can conceive an individual protein as a single indivisible unit or node within the larger cellular network. In turn, when mapping genes across species using sequence similarity, they can assume a protein is either fully preserved in various organisms or entirely absent. Thus, related proteins in different organisms can easily be grouped together into consistent families, which can be given simple, unitary descriptions of their function. Thus, the extended dogma expands the central dogma to include regultion, function and conservation

Complex Reality

To the modern genomics scientist, the classical image of a gene and the ex- tended dogma associated with it are quaint. High-throughput experiments that simultaneously probe the activity of millions of bases in the genome deliver a far less tidy view. First, the process of creating an RNA transcript from a DNA region is more complex than once was imagined. Genes make up only a small fraction of the human genome. But RNA expression studies on human DNA suggest that a substantial amount of the genome outside the boundaries of known or predicted genes is transcribed.

In the quest to accurately describe biological systems, defining basic units is only part of the job. Scientists ultimately want to understand biological function. Function in the genetic sense initially was inferred from the phenotypic effects of genes. A person might have green or blue eyes and a gene related to this characteristic could then be assigned the “eye color” function. Phenotypic function of this sort is most directly shown by deleting or disrupting, or “knocking out,” a particular gene. Disrupting a gene in this way might cause an organism to develop cancer, to change color or to die early. Disabling the yeast mitochondrial gene FZO1, for instance, causes mutant strains to display slow growth and a petite phenotype. But a phenotypic effect doesn’t capture function on the molecular level. To really elucidate the importance of a gene, it’s vital to understand the detailed biochemistry of its products.

Figure 4. Multiple methods exist for capturing gene functions. In a simple hierarchy, at left, a gene is described in single relationships. One unit descends from one “parent”. Directed acyclic graphs (DAGs) capture more complexity. Above the hierarchy captures that FZO1 plays a role in the biogenesis of cellular parts but the DAG gives a wider view of the scope of those roles. (Data contributed by QuickGO: ebi.ac.uk/ego/)

Is the idea of genes as mechanistic “code” part of the problem with the genetic medicine hype?

I have been surfing The New Atlantis now and again after seeing an excellent article pushing back against overly reductionistic models of the mind and brain (“The Limits of Neuro-talk“) Poking around through their intriguing articles, I notice some new writing taking a critical stance regarding a conceptual shortcut: that genes are readily understandable as a context-free abstract computational code with a well-specified mechanism:

http://www.thenewatlantis.com/publications/getting-over-the-code-delusion

“Certainly the idea of a master program seemed powerful to those who were enamored of it. In their enthusiasm they heralded one revolutionary gene discovery after another — a gene for cystic fibrosis (from which the string of letters above is excerpted), a gene for cancer, a gene for obesity, a gene for depression, a gene for alcoholism, a gene for sexual preference. Building block by building block, genetics was going to show how a living organism could be constructed from mindless, indifferent matter.

And yet the most striking thing about the genomic revolution is that the revolution never happened. Yes, it’s been an era of the most amazing technical achievement, marked by an overwhelming flood of new data. It’s true that we are gaining, even if largely by trial and error, certain manipulative powers. But our understanding of the integrity and unified functioning of the living cell has, if anything, been more obscured than illumined by the torrent of data. “Many of us in the genetics community,” write Linda and Edward McCabe in DNA: Promise and Peril (2008), “sincerely believed that DNA analysis would provide us with a molecular crystal ball that would allow us to know quite accurately the clinical futures of our individual patients.” Unfortunately, as they and many others now acknowledge, the reality did not prove so straightforward.

As minor tokens of the changing consciousness among biologists, one could cite recent articles in the world’s two premier scientific journals, each reflecting upon the 1989 discovery of the “gene for cystic fibrosis.” “The Promise of a Cure: 20 Years and Counting” — so ran the headline in Science, followed by this slightly sarcastic gloss: “The discovery of the cystic fibrosis gene brought big hopes for gene-based medicine; although a lot has been achieved over two decades, the payoff remains just around the corner.” An echo quickly came from Nature, without the sarcasm: “One Gene, Twenty Years: When the cystic fibrosis gene was found in 1989, therapy seemed around the corner. Two decades on, biologists still have a long way to go.”

The story has been repeated for one gene after another, which may be why molecular biologist Tom Misteli offered such a startling postscript to the unbounded optimism of the Human Genome Project. “Comparative genome analysis and large-scale mapping of genome features,” he wrote in the journal Cell, “shed little light onto the Holy Grail of genome biology, namely the question of how genomes actually work in vivo” (that is, in living organisms).

But is this surprising? The human body is not a mere implication of clean logical code in abstract conceptual space, but rather a play of complexly shaped and intricately interacting physical substances and forces. Yet the four genetic letters, in the researcher’s mind, became curiously detached from their material matrix. In many scientific discussions it hardly would have mattered whether the letters of the “Book of Life” represented nucleotide bases or completely different molecular combinations. All that counted were certain logical correspondences between code and protein together with a few bits of regulatory logic, all buttressed by the massive weight of an unsupported assumption: somehow, by neatly executing an immaculate, computer-like DNA logic, the organism would fulfill its destiny as a living creature. The details could be worked out later.”

Venter on what medical benefits have come from the Human Genome Project: “Close to zero to put it precisely”

It looks like there is a full-blown revisionist wave in the making on the medical value of the Human Genome Project and genomics, as you can read in Craig Venter’s recent SPIEGEL interview. Venter has become a very controversial public figure and is nobody’s candidate for an unbiased source. However, given his central role as a bioinformatics/genomics entrepeneur and in developing  gene sequencing, it is certainly worth considering his characterizations of genomics vis-a-vis medicine:

SPIEGEL: And what about the fears about the abuse of gene data through insurers or employers, for example? Do you see that as sheer hysteria?

Venter: Abuse is not a question of whether the data is available. It is an issue of laws. You can’t do anything to change the availability of genetic data. Look at this bottle that you have touched — that’s all I need to obtain your entire genetic information.

SPIEGEL: How much would you be able to learn about us by doing so?

Venter: If anything, we don’t really know how to read the genome and it can’t tell us very much right now. So what’s the ethical debate about?

SPIEGEL: The decoding of your personal genome has so far revealed little more than the fact that your ear wax tends to be moist.

Venter: That’s what you say. And what else have I learned from my genome? Very little. We couldn’t even be certain from my genome what my eye color was. Isn’t that sad? Everyone was looking for miracle ‘yes/no’ answers in the genome. “Yes, you’ll have cancer.” Or “No, you won’t have cancer.” But that’s just not the way it is.

SPIEGEL: So the Human Genome Project has had very little medical benefits so far?

Venter: Close to zero to put it precisely.

SPIEGEL: Did it at least provide us with some new knowledge?

Venter: It certainly has. Eleven years ago, we didn’t even know how many genes humans have. Many estimated that number at 100,000, and some went as high as 300,000. We made a lot of enemies when we claimed that there appeared to be considerably fewer — probably closer to the neighborhood of 40,000! And then we found out that there are only half as many. I was just in Stockholm for the 200th anniversary of the Karolinska Institute. The first presentation was about the many achievements the decoding of the genome has brought. Then I spoke and said that this century will be remembered for how little, and not how much, happened in this field.

SPIEGEL: Why is it taking so long for the results of genome research to be applied in medicine?

Venter: Because we have, in truth, learned nothing from the genome other than probabilities. How does a 1 or 3 percent increased risk for something translate into the clinic? It is useless information.

SPIEGEL: There are hundreds of hereditary diseases that can be traced to defects in individual genes. You can determine a lot more than just probabilities through them. But that still hasn’t led to a flood of new treatments.

Venter: There were false expectations. Take Ataxia telangiectasia, for example, a horrible disease. The nervous system degenerates, and people who have it often die in their early teens. The cause is a defect in a single gene, but it is a developmental gene. If your body is built in the wrong way, then you can’t just take a magic pill to rebuild it. If your brain is wired wrong, then it is wired wrong.

SPIEGEL: Who is to blame for those false expectations?

Venter: We were simply always looking at single genes because they were the only genes we had. When people lose their keys at night, they look under the lamp post. Why? Because that’s where you can still see something.

SPIEGEL: But the keys are really located in the dark?

Venter: Exactly. Why did people think there were so many human genes? It’s because they thought there was going to be one gene for each human trait. And if you want to cure greed, you change the greed gene, right? Or the envy gene, which is probably far more dangerous. But it turns out that we’re pretty complex. If you want to find out why someone gets Alzheimer’s or cancer, then it is not enough to look at one gene. To do so, we have to have the whole picture. It’s like saying you want to explore Valencia and the only thing you can see is this table. You see a little rust, but that tells you nothing about Valencia other than that the air is maybe salty. That’s where we are with the genome. We know nothing.

SPIEGEL: Do you think there will be a time when you can extract all this information to yield real medical results?

Venter: For that to happen we need a lot more information: Information about your body’s chemistry, your physiology, your complete medical history, your brain and your entire life. We would need to do that a million times on different people and correlate that data with their genetic information.

SPIEGEL: Will that lead in the end to the kind of personalized medicine that genetic researchers have always touted? Each person would get his or her own personal treatment that is tailored precisely to that person’s genetic make-up?

Venter: That was another one of these silly naïve notions that was out there. It’s not, ‘Oh, we know your genome, we’re going to make this drug for you.’ That will never happen. It is more important that you use the information in the genome about your personal risks and reduce them through intelligent behavior.

SPIEGEL: You have complained about how naïve genome researchers were in the beginning. Will future generations eventually make fun of us in the same way for how naïve we still are today?

Venter: Only time will tell. Nevertheless, we now have what is going to be one of the most important tools for interpreting the human genome: the first synthetic cell. It will enable us to ask questions that would have been inaccessible before.

“Genomics is a way to do science, not medicine”

Over the last ten years or so many of us have been following developments in pharmacogenomics and bioinformatics, wondering if the revolution was truly upon us. The completion of the Human Genome Project, the advances in gene sequencing chips, computational chemistry algorithms, and ever more sophisticated models of signaling pathways in cells, not to mention the impressive capital available to the biotech industry, all made it seem as if a new class of drugs based on genomic variations was on the way. Optimistic thinkers heralded the coming era of personal drugs tailored to individual genomic differences. Certainly the textbook from my 2009 Epidemiology class made it seem as if gene sequencing would play a progressively larger role in modeling variance in human disease outcomes, data that could be fed back into the pharmaceutical development process. A friend getting his PhD in neuroscience who had no wet-lab experience prior told me how easy it is to run the new automated PCR systems to amplify particular sequences of DNA. These and other developments had me convinced that advances in wet lab science, combined with computational modeling of how drugs interact with receptors and other cellular targets to change gene expression and signaling pathways, would quickly lead to a major new category of medical advances (say, by 2015 or 2020). It seemed the revolution truly was nigh…

As I wrote a few months back, the difficulties that personal genomics companies were having in staying solvent served to dampen optimism somewhat. But more significant than the perilous balance sheet of formerly hyped biotech firms is the accumulating change in the conventional wisdom, suggesting that gene sequencing may not lead to many valuable therapies anytime soon. Certainly the jury is still out on this. But mounting evidence suggests the low-hanging fruit has already been plucked in pharmaceutical design, with the easier molecular targets in the common diseases already identified, leaving the drug companies nervous about pouring billions more into r&d. Most of what I am reading suggests we should still expect great things from applying gene sequencing to pharmacology, but not a new class of breakthrough drugs, much less personalized medicine anytime soon (before, say, 2020 or 2030).

Last spring I went to a well-attended meeting of the Austin Forum called “Bio-tech: the Next Big Thing”, and it was like the Internet bonanza of 1999 all over again. Various scientists and boosters extolled the coming great wave of healthcare benefits resulting from genomic medicine and sundry bioengineering advances. I was teaching a class dealing with this material and thought some dissenting perspectives needed to be aired. At question time, I took the mike and pointed out how vanishingly few actual new drugs pharmacogenomics and bioinformatics etc. have delivered after many billions of private and public dollars spent, and thus should we not be cautious about big investments in risky projects? To his credit, UT Provost and pharmacologist Steven Leslie agreed with me and added a much-needed tone of sobriety to the otherwise exuberant mood (if anyone has a link to his answer, please fwd. as he is a man worth listening to).

The last few months have seen a certain backlash against the genomic medicine hype. Here is a nice summary from the eminently readable Nicholas Wade in the June 1 New York Times: “A Decade Later, Human Genome Project Yields Few New Cures”:

The pharmaceutical industry has spent billions of dollars to reap genomic secrets and is starting to bring several genome-guided drugs to market. While drug companies continue to pour huge amounts of money into genome research, it has become clear that the genetics of most diseases are more complex than anticipated and that it will take many more years before new treatments may be able to transform medicine.

“Genomics is a way to do science, not medicine,” said Harold Varmus, president of the Memorial Sloan-Kettering Cancer Center in New York, who in July will become the director of the National Cancer Institute.

The last decade has brought a flood of discoveries of disease-causing mutations in the human genome. But with most diseases, the findings have explained only a small part of the risk of getting the disease. And many of the genetic variants linked to diseases, some scientists have begun to fear, could be statistical illusions.

The Human Genome Project was started in 1989 with the goal of sequencing, or identifying, all three billion chemical units in the human genetic instruction set, finding the genetic roots of disease and then developing treatments. With the sequence in hand, the next step was to identify the genetic variants that increase the risk for common diseases like cancer and diabetes.

It was far too expensive at that time to think of sequencing patients’ whole genomes. So the National Institutes of Health embraced the idea for a clever shortcut, that of looking just at sites on the genome where many people have a variant DNA unit. But that shortcut appears to have been less than successful.

The theory behind the shortcut was that since the major diseases are common, so too would be the genetic variants that caused them. Natural selection keeps the human genome free of variants that damage health before children are grown, the theory held, but fails against variants that strike later in life, allowing them to become quite common. In 2002 the National Institutes of Health started a $138 million project called the HapMap to catalog the common variants in European, East Asian and African genomes.

With the catalog in hand, the second stage was to see if any of the variants were more common in the patients with a given disease than in healthy people. These studies required large numbers of patients and cost several million dollars apiece. Nearly 400 of them had been completed by 2009. The upshot is that hundreds of common genetic variants have now been statistically linked with various diseases.

But with most diseases, the common variants have turned out to explain just a fraction of the genetic risk. It now seems more likely that each common disease is mostly caused by large numbers of rare variants, ones too rare to have been cataloged by the HapMap.

Here are some excerpts from the December 2009 edition of the Economist: “Looming crisis in Human Genetics” by evolutionary psychologist Geoffrey Miller:

Human geneticists have reached a private crisis of conscience, and it will become public knowledge in 2010…

About five years ago, genetics researchers became excited about new methods for “genome-wide association studies” (GWAS). We already knew from twin, family and adoption studies that all human traits are heritable: genetic differences explain much of the variation between individuals. We knew the genes were there; we just had to find them….

In 2010, GWAS fever will reach its peak. Dozens of papers will report specific genes associated with almost every imaginable trait—intelligence, personality, religiosity, sexuality, longevity, economic risk-taking, consumer preferences, leisure interests and political attitudes. The data are already collected, with DNA samples from large populations already measured for these traits. It’s just a matter of doing the statistics and writing up the papers for Nature Genetics. …

GWAS researchers will, in public, continue trumpeting their successes to science journalists and Science magazine. They will reassure Big Pharma and the grant agencies that GWAS will identify the genes that explain most of the variation in heart disease, cancer, obesity, depression, schizophrenia, Alzheimer’s and ageing itself. …

In private, though, the more thoughtful GWAS researchers are troubled. They hold small, discreet conferences on the “missing heritability” problem: if all these human traits are heritable, why are GWAS studies failing so often? …

But the genes typically do not replicate across studies. Even when they do replicate, they never explain more than a tiny fraction of any interesting trait. In fact, classical Mendelian genetics based on family studies has identified far more disease-risk genes with larger effects than GWAS research has so far.

Why the failure? The missing heritability may reflect limitations of DNA-chip design: GWAS methods so far focus on relatively common genetic variants in regions of DNA that code for proteins. They under-sample rare variants and DNA regions translated into non-coding RNA, which seems to orchestrate most organic development in vertebrates. Or it may be that thousands of small mutations disrupt body and brain in different ways in different populations. At worst, each human trait may depend on hundreds of thousands of genetic variants that add up through gene-expression patterns of mind-numbing complexity.