spaln icon indicating copy to clipboard operation
spaln copied to clipboard

two errors of gff3 output

Open Secretloong opened this issue 5 years ago • 1 comments

First, some results would be converted to 0-based start in gff3 output. I think it's the fault with the deletion of the first base. For example: image the gff3 output: chr1_AADN03009696_random ALN gene 0 701 460 - . ID=gene01217;Name=chr1_AADN03009696_random_0

Second, the convert lacks the correct ORF information. the capital would be extracted for gff3 output, but the ORF is broken. So that, I can not get the correct peptide sequences from the gff3 cds sequences. fro example: image the extracted pep: LHTNTKPGASKCFSLNSVSGFLFKEKVYDCKEEKWPDHGUHTEGSTSSGK the correct pep: LHTNTKPGASKCFSLNSVSGFLFKEKVYDCKEEKGGQTMGDIQKAQPHQE

Secretloong avatar May 16 '19 09:05 Secretloong

1st case: Spaln does not carefully consider the cases where the first or the last codon is incomplete. The -LS option may be helpful to trim out such incomplete terminal codons. By the way, as gff adopts 1-based coordinate system, coordinate 0 means some exceptional situation.

2nd case: the above alignment indicates that a frame-shift exists near the intron boundary. (For example, the A stretch length might actually be 3 rather than 4.) It is relatively easy to detect frame-shifts in the midst of exons but more difficult to find those near exon ends. I am not sure but reduced frame-shift penalty (eg. -yx20) might be useful if your genomic sequence is indel-prone.

ogotoh avatar May 17 '19 08:05 ogotoh