spaln
spaln copied to clipboard
two errors of gff3 output
First, some results would be converted to 0-based start in gff3 output. I think it's the fault with the deletion of the first base. For example:
the gff3 output:
chr1_AADN03009696_random ALN gene 0 701 460 - . ID=gene01217;Name=chr1_AADN03009696_random_0
Second, the convert lacks the correct ORF information. the capital would be extracted for gff3 output, but the ORF is broken. So that, I can not get the correct peptide sequences from the gff3 cds sequences. fro example:
the extracted pep:
LHTNTKPGASKCFSLNSVSGFLFKEKVYDCKEEKWPDHGUHTEGSTSSGK
the correct pep:
LHTNTKPGASKCFSLNSVSGFLFKEKVYDCKEEKGGQTMGDIQKAQPHQE
1st case: Spaln does not carefully consider the cases where the first or the last codon is incomplete. The -LS option may be helpful to trim out such incomplete terminal codons. By the way, as gff adopts 1-based coordinate system, coordinate 0 means some exceptional situation.
2nd case: the above alignment indicates that a frame-shift exists near the intron boundary. (For example, the A stretch length might actually be 3 rather than 4.) It is relatively easy to detect frame-shifts in the midst of exons but more difficult to find those near exon ends. I am not sure but reduced frame-shift penalty (eg. -yx20) might be useful if your genomic sequence is indel-prone.