miniprot
miniprot copied to clipboard
Negative length for introns in cs:Z: tag
I noticed this when testing for #33. In the example shown below, the cs:Z:
tag uses the following to represent an intron: ~gt-1ag
. Is this expected?
$ efetch -db protein -id BAM19251.1 -format fasta > prot.fa
$ efetch -db nucleotide -id NC_069145.1 -format fasta > genome.fa
$ /tmp/miniprot-0.8_x64-linux/miniprot -J 18 genome.fa prot.fa 2>/dev/null
BAM19251.1 210 0 210 - NC_069145.1 87567467 83367290 83368071 546 630 0 AS:i:946 ms:i:1006 np:i:193 da:i:-1 do:i:0 cg:Z:12M2V41M77V66M78N89M cs:Z:*acaM*gcaT*tcgL*acgM:2*ggcD*tcgW*agcR:3*gcS~gt-1ag-c:8*atcV:6*gagP*aacA:8*gagA:3*gagQ:1*attL*tctQ:1*atgV:1*gagQ:4*agR~gt74ag-a:6*aagS*acgA:31*atcV:8*tcgA:5*aacT:1*aacT:9~gt78ag:3*gaaQ*acgM:42*gacE:21*gatE:2*gtgC:16
Oh, 2bp intron. Smells like a bug. I will have a look later.
Actually it occurs to me that -J
should be larger than -F
; otherwise a frameshift ~~will always~~ may be aligned as an intron. You may try to reduce frameshift penalty -F
. However, with an excessively small -F
, you will get more frameshifts in alignment.
Actually it occurs to me that
-J
should be larger than-F
Thanks! Looks like at some point in the past, the default value for -F
was set to 17 (see manual). I will try reducing it to 20 and test.
Didn't realize the manpage was that old. Now updated to v0.9.