gffread icon indicating copy to clipboard operation
gffread copied to clipboard

CDS and protein

Open Gon1976 opened this issue 1 year ago • 0 comments

I installed gffread and make first this command: gffread -w transcript.fa -g Genome.fasta Anotacion.gff and looks ok (all the CDS transcripts starts with ATG)

then I tried: gffread -y proteins.pep -g Genome.fasta Anotacion.gff

In this case the protein dont start with M, losing the correct ORF. I noticed, that the transcript fasta file the transcripts name have different CDS coordinates: 3500 transcripts start the CDS=1-end but 12356 transcripts with CDS=2-end and 4015 CDS=3-end.

So, only 3500 CDS have coordinates starting from 1 to end and get correct protein, but 12356 transcripts start the CDS in 2, making the protein from and incorrect start (an generating stop codon because the incorrect ORF).

My question is how to correct the gff files or the starting point, because gffread is the first step in PanExplorer pipeline, and I need to continue with all the proteins to get the results

Gon1976 avatar Dec 18 '23 18:12 Gon1976