There is an issue I rarely if ever see addressed in the healthcare informatics world, but one that looms much larger in bioinformatics and pharmacogenomics: what is a gene, anyway? My students have been struggling with me forcing them to memorize two definitions: the gene as a unit of heredity, recessive or dominant, but also gene as a DNA sequence that makes protein (as in this Wikipedia graphic):
A couple of years ago the New York Times ran an excellent piece about changing views of the gene. They went so far as to characterize the gene as having an “identity crisis”:
new large-scale studies of DNA are causing her and many of her colleagues to rethink the very nature of genes. They no longer conceive of a typical gene as a single chunk of DNA encoding a single protein. “It cannot work that way,” Dr. Prohaska said. There are simply too many exceptions to the conventional rules for genes.
It turns out, for example, that several different proteins may be produced from a single stretch of DNA. Most of the molecules produced from DNA may not even be proteins, but another chemical known as RNA. The familiar double helix of DNA no longer has a monopoly on heredity. Other molecules clinging to DNA can produce striking differences between two organisms with the same genes. And those molecules can be inherited along with DNA.
The gene, in other words, is in an identity crisis.
There was a reference to a fascinating paper: “Genomics Counfounds Gene Classification” by Gerstein and Seringhaus (2008). The upshot is that the classical view of the gene/DNA relationship, where sections of DNA that is transcribed by RNA and translated into a protein, doesn’t account for data about noncoding DNA,noncoding RNA, and alternative splicing:
This iterative one-gene, one-protein, one-function relationship paints a relatively straightforward picture of subcellular life. When describing the function of a given gene in a cell, biologists can conceive an individual protein as a single indivisible unit or node within the larger cellular network. In turn, when mapping genes across species using sequence similarity, they can assume a protein is either fully preserved in various organisms or entirely absent. Thus, related proteins in different organisms can easily be grouped together into consistent families, which can be given simple, unitary descriptions of their function. Thus, the extended dogma expands the central dogma to include regultion, function and conservation
To the modern genomics scientist, the classical image of a gene and the ex- tended dogma associated with it are quaint. High-throughput experiments that simultaneously probe the activity of millions of bases in the genome deliver a far less tidy view. First, the process of creating an RNA transcript from a DNA region is more complex than once was imagined. Genes make up only a small fraction of the human genome. But RNA expression studies on human DNA suggest that a substantial amount of the genome outside the boundaries of known or predicted genes is transcribed.
In the quest to accurately describe biological systems, defining basic units is only part of the job. Scientists ultimately want to understand biological function. Function in the genetic sense initially was inferred from the phenotypic effects of genes. A person might have green or blue eyes and a gene related to this characteristic could then be assigned the “eye color” function. Phenotypic function of this sort is most directly shown by deleting or disrupting, or “knocking out,” a particular gene. Disrupting a gene in this way might cause an organism to develop cancer, to change color or to die early. Disabling the yeast mitochondrial gene FZO1, for instance, causes mutant strains to display slow growth and a petite phenotype. But a phenotypic effect doesn’t capture function on the molecular level. To really elucidate the importance of a gene, it’s vital to understand the detailed biochemistry of its products.
Figure 4. Multiple methods exist for capturing gene functions. In a simple hierarchy, at left, a gene is described in single relationships. One unit descends from one “parent”. Directed acyclic graphs (DAGs) capture more complexity. Above the hierarchy captures that FZO1 plays a role in the biogenesis of cellular parts but the DAG gives a wider view of the scope of those roles. (Data contributed by QuickGO: ebi.ac.uk/ego/)