gffread
gffread copied to clipboard
gffread to protein
Hello, I tried to extract the coding proteins FASTA from gtf file but the output looks like the following; command used: gffread -y output_protein.fasta -g genome.fasta transcripts.gff3
output; LYT.WSHVP.QTLQSHR.CPSRLLELCSSPLLQMTIHGA.YSFGE.HIMYDIKLDNLQYVRSW.LRKLKL LVQLDKFRTCHP.TPC.SS.TSLDRLHQPAHACLWCGQQWWQYRGLVRQSQSSR.QRSYRTQLVGWREQ. LPGWCL
Curious what these dots (.) refers to and how can I extract the proper coding protein FASTA sequences? Kindly suggest!
Regards, B
It's an internal stop codon, see Issue 14
According to the method mentioned in #Issue 14, adding -S will only change . to *. But the stop codon inserted in the middle of the mRNA sequence will be translated into amino acids, for example, UGA is U and UAG is O. What to do in this situation?
I have the same problem, but the terminating codon is not included in the gbf file that is converted to gtf format