Student working on a research project.

Evolution Of The Genetic Code

The Origin and Evolution of the Genetic Code, Protein Structure, and Protein Function Multiple Open Reading Frames. In principal, any strand of double-helical DNA could be read in six different ways to produce six proteins having completely different sequences, folds, and functions. The different proteins come from six different frames in which the gene can be read. The reading frame that corresponds to a protein product is called its open reading frame (ORF). It has been assumed that, over the course of three billion years of evolution, only one of the six possible sequences is able to produce a viable protein. Contrary to this assumption, we have discovered that 18% of all the genes in the gene bank have retained the potential to produce more than one protein by reading alternate frames of the gene. These genes have multiple open reading frames (MORFs). We have shown that the occurrence of genes having MORFs is 200 times greater than random. Codon Bias. We have also shown that over 90% of the genes that have MORFs have a severe bias in their use of the genetic code. The use of the 64 codons that define the 20 amino acids in human proteins is random. However, we have found that only half of the 64 codons are being used in the genes that have MORFs. No bias in codon use comparable to this in severity or frequency has ever been detected. There is accumulating evidence that those codons that contain two or three of the bases G and C were defined first. We have found that genes with MORFs and a GC bias are most pronounced in a few families of proteins that are present in all species of bacteria and eukaryotes (yeast, plants, insects, and animals) and are vital to basic life processes common to all living things. The protein families include ribosomal proteins, ATP binding proteins, SCORs, and heat shock proteins. Amino Acid Bias. We discovered that proteins that have MORFs and GC codon bias also have a pronounced amino acid bias. This suggests that the most ancient proteins were not only encoded by a subset of the genetic code, but they were also composed of a subset of the 20 amino acids. We conclude that tryptophan and cysteine were the last of the 20 amino acids to appear in proteins. A Primordial Two-Letter Code. Our most recent analysis of the molecular details of the phenomena of gene duplication has led to the identification of an ancient family of highly symmetric barrel-shaped molecules that were originally encoded by just 20 of the 64 codons. These ancient genes support the possibility of a two-letter genetic code that preceded the three-letter code two billion years ago. This work could revolutionize how we think about the evolution of protein sequences and folding, and it could help us to design proteins having specific functions.

Research Project Information

Disciplines: Biochemistry, Bioinformatics And Computational Biology, Biology, Computer Science, Software Engineering, Web Development, Genomics/proteomics
Student Skill-Set Needed: -Understanding of bioinformatics analysis tools including blast, prosite, and clustalw -Background/Experience in software development -Data Mining/Processing -Pattern Matching/Recognition -Web Service Development -Unix Systems/Scripting Languages -P
Compensation: Work Study
Available: Fall, Spring


For further information on this opportunity, or to apply, contact:

Faculty Member: Dr. William Duax
Department: Structural Biology
Office: Hauptman-Woodward Medical Res. Inst. 700 Elicott St. Buffalo, NY
Phone: (716)898-8600