minimap2 icon indicating copy to clipboard operation
minimap2 copied to clipboard

inconsistent `ms` scores

Open xinehc opened this issue 5 months ago • 3 comments

Hi,

When aligning a sequence to an augmented reference database I got a different ms, I wonder does ms depend on tp?

One reference seq:

minimap2 -cx map-ont one.fa query.fa
[M::mm_idx_gen::0.001*5.95] collected minimizers
[M::mm_idx_gen::0.002*4.99] sorted minimizers
[M::main::0.002*4.98] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.002*4.79] mid_occ = 10
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.002*4.63] distinct minimizers: 1848 (100.00% are singletons); average occurrences: 1.000; average spacing: 5.411; total length: 10000
SRR23690005.810506	4901	0	4887	+	NZ_CP060632.1_2416720-2426719:-	10000	1879	6750	4751	4890	60	NM:i:139	ms:i:8942	AS:i:8934	nn:i:0	tp:A:P	cm:i:676	s1:i:4011	s2:i:0	de:f:0.0268	rl:i:0	cg:Z:68M2I31M1I1203M1I1788M1D266M4I33M3I208M1D81M1I311M1I323M1I134M2I233M1D1M2I4M1I184M

Two reference seqs:

minimap2 -cx map-ont two.fa query.fa
[M::mm_idx_gen::0.004*3.47] collected minimizers
[M::mm_idx_gen::0.005*3.31] sorted minimizers
[M::main::0.005*3.31] loaded/built the index for 2 target sequence(s)
[M::mm_mapopt_update::0.005*3.22] mid_occ = 10
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2
[M::mm_idx_stat::0.006*3.15] distinct minimizers: 2397 (45.06% are singletons); average occurrences: 1.549; average spacing: 5.385; total length: 20000
SRR23690005.810506	4901	0	4887	+	CABOHD010000001.1_23218-33217:-	10000	257	5129	4847	4891	60	NM:i:44	ms:i:7856	AS:i:9504	nn:i:0	tp:A:P	cm:i:842	s1:i:4700	s2:i:4011	de:f:0.0074	rl:i:0	cg:Z:67M1I4M1D2M2I1229M1I1788M1D266M4I33M3I208M1D81M1I311M1I323M1I134M2I233M1D1M2I4M1I184M
SRR23690005.810506	4901	0	4887	+	NZ_CP060632.1_2416720-2426719:-	10000	1879	6750	4751	4890	0	NM:i:139	ms:i:2916	AS:i:8934	nn:i:0	tp:A:S	cm:i:676	s1:i:4011	de:f:0.0268	rl:i:0	cg:Z:68M2I31M1I1203M1I1788M1D266M4I33M3I208M1D81M1I311M1I323M1I134M2I233M1D1M2I4M1I184M

AS, de and cg are all identical.

Here are the sequences: example.zip

xinehc avatar Jan 08 '24 07:01 xinehc

tp is determined by ms

lh3 avatar Jan 08 '24 12:01 lh3

thanks for the explanation, but why ms dropped from 8942 to 2916 even if the two alignments can be seen as identical? Is this a feature of ms?

xinehc avatar Jan 08 '24 13:01 xinehc

Oh, I missed that. Will have a look.

lh3 avatar Jan 08 '24 13:01 lh3

Fixed. Thanks for the example. This is an oversight I have missed for three years.

lh3 avatar Mar 11 '24 21:03 lh3