miniprot
miniprot copied to clipboard
Mapq
For proteins mapping to multiple contigs/chromosomes, how might one deduce the equivalent of mapping quality with miniprot? My guess is one could have a go at AS and as scores (although I am seeing ms
in the resulting PAF files?)
+----+------+---------------------------------------------------+
|Tag | Type | Description |
+----+------+---------------------------------------------------+
| AS | i | Alignment score from dynamic programming |
| as | i | Alignment score excluding introns |
| np | i | Number of amino acid matches with positive scores |
| da | i | Distance to the nearest start codon |
| do | i | Distance to the nearest stop codon |
| cg | i | Protein CIGAR |
| cs | i | Difference string |
+----+------+---------------------------------------------------+
I will add mapping quality in future. Miniprot doesn't have it now because mapping quality is not very important for cross-species alignment.
The as
in the manpage has been renamed to ms
. It is roughly equivalent to the ms
tag reported by minimap2. Please use this tag to estimate mapping uniqueness. AS
sometimes favors pseudogenes.
Thank you!
I will keep this issue open as a reminder to myself. BTW, I have just updated the manpage to replace "as" with "ms".
Just wanted to join in to say MAPQ would be a very nice addition. For example I am working with sponges, and have ~50 sponge transcriptomes that I am mapping to a new species that I am trying to annotate. For each locus in the genome it would be nice to be able to filter out poor matches based on MAPQ in the PAF line. Thanks for writing this nice piece of software, @lh3, I had been using a tblastn pipeline to perform a similar function before this.
MAPQ won't be very useful for filtering poor matches. You should look at score, identity and positive.