gffread icon indicating copy to clipboard operation
gffread copied to clipboard

gffread to protein

Open bijendrabio opened this issue 1 year ago • 3 comments

Hello, I tried to extract the coding proteins FASTA from gtf file but the output looks like the following; command used: gffread -y output_protein.fasta -g genome.fasta transcripts.gff3

output; LYT.WSHVP.QTLQSHR.CPSRLLELCSSPLLQMTIHGA.YSFGE.HIMYDIKLDNLQYVRSW.LRKLKL LVQLDKFRTCHP.TPC.SS.TSLDRLHQPAHACLWCGQQWWQYRGLVRQSQSSR.QRSYRTQLVGWREQ. LPGWCL

Curious what these dots (.) refers to and how can I extract the proper coding protein FASTA sequences? Kindly suggest!

Regards, B

bijendrabio avatar Mar 28 '23 20:03 bijendrabio

It's an internal stop codon, see Issue 14

alephreish avatar May 11 '23 13:05 alephreish

According to the method mentioned in #Issue 14, adding -S will only change . to *. But the stop codon inserted in the middle of the mRNA sequence will be translated into amino acids, for example, UGA is U and UAG is O. What to do in this situation?

Jason-bot-stack avatar Mar 01 '24 14:03 Jason-bot-stack

I have the same problem, but the terminating codon is not included in the gbf file that is converted to gtf format

bloomachine avatar Aug 08 '24 09:08 bloomachine