miniprot icon indicating copy to clipboard operation
miniprot copied to clipboard

Mapq

Open jelber2 opened this issue 2 years ago • 5 comments

For proteins mapping to multiple contigs/chromosomes, how might one deduce the equivalent of mapping quality with miniprot? My guess is one could have a go at AS and as scores (although I am seeing ms in the resulting PAF files?)

+----+------+---------------------------------------------------+
|Tag | Type |                    Description                    |
+----+------+---------------------------------------------------+
| AS |  i   | Alignment score from dynamic programming          |
| as |  i   | Alignment score excluding introns                 |
| np |  i   | Number of amino acid matches with positive scores |
| da |  i   | Distance to the nearest start codon               |
| do |  i   | Distance to the nearest stop codon                |
| cg |  i   | Protein CIGAR                                     |
| cs |  i   | Difference string                                 |
+----+------+---------------------------------------------------+

jelber2 avatar Sep 14 '22 10:09 jelber2

I will add mapping quality in future. Miniprot doesn't have it now because mapping quality is not very important for cross-species alignment.

The as in the manpage has been renamed to ms. It is roughly equivalent to the ms tag reported by minimap2. Please use this tag to estimate mapping uniqueness. AS sometimes favors pseudogenes.

lh3 avatar Sep 14 '22 13:09 lh3

Thank you!

jelber2 avatar Sep 14 '22 13:09 jelber2

I will keep this issue open as a reminder to myself. BTW, I have just updated the manpage to replace "as" with "ms".

lh3 avatar Sep 14 '22 13:09 lh3

Just wanted to join in to say MAPQ would be a very nice addition. For example I am working with sponges, and have ~50 sponge transcriptomes that I am mapping to a new species that I am trying to annotate. For each locus in the genome it would be nice to be able to filter out poor matches based on MAPQ in the PAF line. Thanks for writing this nice piece of software, @lh3, I had been using a tblastn pipeline to perform a similar function before this.

conchoecia avatar Sep 16 '22 14:09 conchoecia

MAPQ won't be very useful for filtering poor matches. You should look at score, identity and positive.

lh3 avatar Sep 17 '22 01:09 lh3