Translation of an ORF


In the simplest, most common case, a gene gets transcribed (copied) from the genome and then the Open Reading Frame (ORF) gets translated from the transcribed mRNA to form a protein. The tranlation aparatus begins when it finds a Methionine (Met) codon, which is the sequence ATG, in the transcribed mRNA. Starting with the initial Met, the three-base codons are translated into amino acids, which are chained together to form a protein. Here, we present an example of taking a DNA sequence in a FASTA file and applying those rules.

Consider the following FASTA file, which contains a single sequence of 267 bases:
>disease|a very phony disease protein
NNNGgtCNNNcgGcCANNTAccAGgTCGCAcTTtGtttNcANGGNNNNNNNGTTTttTAA
aaCCNAGGtAccGCgGNNCAcATTNGGATTTCCTCgCGGGTgANGCNCGGGTgANGCNCA
TGgaaGCCTctttagAgtcGTgaAccCaaCCGAgAAgggAAAAtAcCGTGtGTTTTtGTT
CCNNNcNCctGGGTACGCGGCNCAGCtGNNNGaCGctTNTTCGaGTgCCCCNNNNTTcAA
ttCTTNNCTtTGAAGAAAAAAAAAAAA
(In reality, the transcribed sequence would be RNA, not DNA, so all of the 'T's in the sequence would be 'U's instead, but we are focusing on the task of getting a translation from a sequence in a FASTA file, so we are skipping that step.)

Remember that it is broken up for purposes of storage, but what this text represents is a single long sequence of bases.

Step 1 of the translation process is to identify the Methionine codon (ATG). In this case, it starts at the end of the second line and continues on the third:

>disease|a very phony disease protein
NNNGgtCNNNcgGcCANNTAccAGgTCGCAcTTtGtttNcANGGNNNNNNNGTTTttTAA
aaCCNAGGtAccGCgGNNCAcATTNGGATTTCCTCgCGGGTgANGCNCGGGTgANGCNCA
TGgaaGCCTctttagAgtcGTgaAccCaaCCGAgAAgggAAAAtAcCGTGtGTTTTtGTT
CCNNNcNCctGGGTACGCGGCNCAGCtGNNNGaCGctTNTTCGaGTgCCCCNNNNTTcAA
ttCTTNNCTtTGAAGAAAAAAAAAAAA

Step 2 is to translate from that point, one three-base codon at a time. So the remaining sequence is no longer considered as a bunch of individual bases, it can only be viewed as sets of three, because that's how the cells translation mechanism reads it:

>disease|a very phony disease protein
NNNGgtCNNNcgGcCANNTAccAGgTCGCAcTTtGtttNcANGGNNNNNNNGTTTttTAA
aaCCNAGGtAccGCgGNNCAcATTNGGATTTCCTCgCGGGTgANGCNCGGGTgANGCNCA
TG gaa GCC Tct tta gAg tcG Tga Acc Caa CCG AgA Agg gAA AAt
AcC GTG tGT TTT tGT TCC NNN cNC ctG GGT ACG CGG CNC AGC tGN
NNG aCG ctT NTT CGa GTg CCC CNN NNT TcA Att CTT NNC TtT GAA
GAA AAA AAA AAA A
First 4 codons = ATG gaa GCC Tct = M E A S

Step 3 is to continue translating until you encounter a stop codon (TAA, TAG, or TGA). Remember that it must line up with the other codons in this sequence. The first occurence of TAG in this sequence, highlighted in red, is not a stop codon, because it does not line up evenly with the codons being translated. (This is what is refered to as "being out of frame". Frame in this context means the codons you get when you break the sequence into groups of three bases, starting from the Met.) The first stop codon that is in the correct frame is shown highlighted in yellow.

>disease|a very phony disease protein
NNNGgtCNNNcgGcCANNTAccAGgTCGCAcTTtGtttNcANGGNNNNNNNGTTTttTAA
aaCCNAGGtAccGCgGNNCAcATTNGGATTTCCTCgCGGGTgANGCNCGGGTgANGCNCA
TG gaa GCC Tct tta gAg tcG Tga Acc Caa CCG AgA Agg gAA AAt
AcC GTG tGT TTT tGT TCC NNN cNC ctG GGT ACG CGG CNC AGC tGN
NNG aCG ctT NTT CGa GTg CCC CNN NNT TcA Att CTT NNC TtT GAA
GAA AAA AAA AAA A
Translated codons = ATG gaa GCC Tct tta gAg tcG = M E A S L E S

At this point you are done translating and the rest of the sequence is ignored. The translated sequence is represented by concatenating the single-letter codes for the translated amino acids.

Final translated protein = MEASLES