Consider the following FASTA file, which contains a single
sequence of 267 bases:
>disease|a very phony disease protein
NNNGgtCNNNcgGcCANNTAccAGgTCGCAcTTtGtttNcANGGNNNNNNNGTTTttTAA
aaCCNAGGtAccGCgGNNCAcATTNGGATTTCCTCgCGGGTgANGCNCGGGTgANGCNCA
TGgaaGCCTctttagAgtcGTgaAccCaaCCGAgAAgggAAAAtAcCGTGtGTTTTtGTT
CCNNNcNCctGGGTACGCGGCNCAGCtGNNNGaCGctTNTTCGaGTgCCCCNNNNTTcAA
ttCTTNNCTtTGAAGAAAAAAAAAAAA |
|
(In reality, the transcribed sequence would be RNA, not
DNA, so all of the 'T's in the sequence would be 'U's
instead, but we are focusing on the task of getting a
translation from a sequence in a FASTA file, so we are
skipping that step.)
Remember that it is broken up for purposes of storage,
but what this text represents is a single long sequence of
bases.
Step 1 of the translation process is to identify
the Methionine codon (ATG). In this case, it starts at the
end of the second line and continues on the third:
>disease|a very phony disease protein
NNNGgtCNNNcgGcCANNTAccAGgTCGCAcTTtGtttNcANGGNNNNNNNGTTTttTAA
aaCCNAGGtAccGCgGNNCAcATTNGGATTTCCTCgCGGGTgANGCNCGGGTgANGCNCA
TGgaaGCCTctttagAgtcGTgaAccCaaCCGAgAAgggAAAAtAcCGTGtGTTTTtGTT
CCNNNcNCctGGGTACGCGGCNCAGCtGNNNGaCGctTNTTCGaGTgCCCCNNNNTTcAA
ttCTTNNCTtTGAAGAAAAAAAAAAAA |
|
Step 2 is to translate from that point, one three-base codon at a time. So the remaining sequence is no longer considered as a bunch of individual bases, it can only be viewed as sets of three, because that's how the cells translation mechanism reads it:
>disease|a very phony disease protein
NNNGgtCNNNcgGcCANNTAccAGgTCGCAcTTtGtttNcANGGNNNNNNNGTTTttTAA
aaCCNAGGtAccGCgGNNCAcATTNGGATTTCCTCgCGGGTgANGCNCGGGTgANGCNCA
TG gaa GCC Tct tta gAg tcG Tga Acc Caa CCG AgA Agg gAA AAt
AcC GTG tGT TTT tGT TCC NNN cNC ctG GGT ACG CGG CNC AGC tGN
NNG aCG ctT NTT CGa GTg CCC CNN NNT TcA Att CTT NNC TtT GAA
GAA AAA AAA AAA A |
|
First 4 codons = ATG gaa GCC Tct = M E A S
Step 3 is to continue translating until you encounter a stop codon (TAA, TAG, or TGA). Remember that it must line up with the other codons in this sequence. The first occurence of TAG in this sequence, highlighted in red, is not a stop codon, because it does not line up evenly with the codons being translated. (This is what is refered to as "being out of frame". Frame in this context means the codons you get when you break the sequence into groups of three bases, starting from the Met.) The first stop codon that is in the correct frame is shown highlighted in yellow.
>disease|a very phony disease protein
NNNGgtCNNNcgGcCANNTAccAGgTCGCAcTTtGtttNcANGGNNNNNNNGTTTttTAA
aaCCNAGGtAccGCgGNNCAcATTNGGATTTCCTCgCGGGTgANGCNCGGGTgANGCNCA
TG gaa GCC Tct tta gAg tcG Tga Acc Caa CCG AgA Agg gAA AAt
AcC GTG tGT TTT tGT TCC NNN cNC ctG GGT ACG CGG CNC AGC tGN
NNG aCG ctT NTT CGa GTg CCC CNN NNT TcA Att CTT NNC TtT GAA
GAA AAA AAA AAA A |
|
Translated codons = ATG gaa GCC Tct tta gAg tcG = M E A S L E S
At this point you are done translating and the rest of
the sequence is ignored. The translated sequence is
represented by concatenating the single-letter codes for the
translated amino acids.
Final translated protein = MEASLES
|