MashMap
MashMap copied to clipboard
Inconsistent alignments depending on whether reference file is gzipped or not
Hi,
I am using mashmap v3.1.3 and I noticed that the same alignment was reported multiple times in the output file. For example, here is the output for reference chromosome 21:
ptg000091l 861542 0 861542 + chr21 45090682 152523 1014065 13 861542 20 id:f:0.990785 kc:f:0.283563
ptg000217l 527266 0 527266 + chr21 45090682 178973 706239 12 527266 20 id:f:0.990089 kc:f:0.331461
ptg000217l 527266 0 527266 + chr21 45090682 178973 706239 12 527266 20 id:f:0.990089 kc:f:0.331461
ptg000203l 686331 0 686331 + chr21 45090682 179063 865394 15 686331 25 id:f:0.996609 kc:f:0.235299
ptg000268l 916773 0 916773 + chr21 45090682 582998 1499771 3 916773 12 id:f:0.93172 kc:f:0.0635271
ptg000268l 916773 0 916773 + chr21 45090682 582998 1499771 3 916773 12 id:f:0.93172 kc:f:0.0635271
ptg000268l 916773 0 916773 + chr21 45090682 582998 1499771 3 916773 12 id:f:0.93172 kc:f:0.0635271
ptg000145l 2623179 0 2623179 + chr21 45090682 1133648 3756827 12 2623179 19 id:f:0.98662 kc:f:0.255438
ptg000158l 1222257 0 1222257 + chr21 45090682 1565548 2787805 3 1222257 12 id:f:0.936192 kc:f:0.214653
ptg000158l 1222257 0 1222257 + chr21 45090682 1565548 2787805 3 1222257 12 id:f:0.936192 kc:f:0.214653
ptg000158l 1222257 0 1222257 + chr21 45090682 1565548 2787805 3 1222257 12 id:f:0.936192 kc:f:0.214653
ptg000158l 1222257 0 1222257 + chr21 45090682 1565548 2787805 3 1222257 12 id:f:0.936192 kc:f:0.214653
ptg000158l 1222257 0 1222257 + chr21 45090682 1565548 2787805 3 1222257 12 id:f:0.936192 kc:f:0.214653
ptg000158l 1222257 0 1222257 + chr21 45090682 1565548 2787805 3 1222257 12 id:f:0.936192 kc:f:0.214653
ptg000160l 848270 0 848270 + chr21 45090682 2308763 3157033 8 848270 16 id:f:0.972836 kc:f:0.583853
ptg000160l 848270 0 848270 + chr21 45090682 2308763 3157033 8 848270 16 id:f:0.972836 kc:f:0.583853
ptg000151l 555938 0 555938 + chr21 45090682 2459556 3015494 15 555938 22 id:f:0.993434 kc:f:1.23635
ptg000172l 1014256 0 1014256 + chr21 45090682 2577878 3592134 10 1014256 17 id:f:0.980634 kc:f:0.597447
ptg000215l 318592 0 318592 + chr21 45090682 5433633 5752225 19 318592 29 id:f:0.998634 kc:f:0.130072
ptg000025l 2865536 0 2865536 + chr21 45090682 5830375 8695911 10 2865536 17 id:f:0.978886 kc:f:0.474346
ptg000176l 584981 0 584981 - chr21 45090682 8297841 8882822 9 584981 16 id:f:0.977014 kc:f:0.599697
ptg000176l 584981 0 584981 + chr21 45090682 8297841 8882822 9 584981 16 id:f:0.977014 kc:f:0.599697
ptg000176l 584981 0 584981 + chr21 45090682 8297841 8882822 9 584981 16 id:f:0.977014 kc:f:0.599697
ptg000176l 584981 0 584981 + chr21 45090682 8297841 8882822 9 584981 16 id:f:0.977014 kc:f:0.599697
ptg000093l 3796125 0 3796125 + chr21 45090682 8542762 12338887 19 3796125 29 id:f:0.998634 kc:f:0.41517
ptg000125l 2794119 0 2794119 + chr21 45090682 9318318 12112437 18 2794119 25 id:f:0.997158 kc:f:0.524907
ptg000290l 812527 0 812527 + chr21 45090682 10614047 11426574 4 812527 13 id:f:0.945935 kc:f:0.117817
ptg000293l 206332 0 206332 + chr21 45090682 10988834 11195166 10 206332 17 id:f:0.978886 kc:f:0.0164015
ptg000292l 229265 0 229265 + chr21 45090682 11199120 11428385 11 229265 17 id:f:0.982112 kc:f:0.0476858
ptg000033l 27686851 0 27686851 + chr21 45090682 24117979 45090681 20 27686851 255 id:f:1 kc:f:0.872569
ptg000115l 3037875 0 3037875 + chr21 45090682 38473000 41510875 18 3037875 25 id:f:0.997158 kc:f:0.865375
ptg000147l 1146249 0 1146249 + chr21 45090682 42104328 43250577 19 1146249 29 id:f:0.998634 kc:f:1.36605
ptg000047l 2315813 0 2315813 + chr21 45090682 43904919 45090681 19 2315813 29 id:f:0.998634 kc:f:1.30252
I found this odd so I re-ran it. This time I happened to use an uncompressed version of my reference sequence file and I didn't get duplicated alignments, but I got some new alignments and the positions of previously found alignments changed. Here is again the output for reference chr21:
ptg000091l 861542 0 861542 + chr21 45090682 173803 1035345 29 861542 22 id:f:0.993222 kc:f:0.33354
ptg000203l 686331 0 686331 + chr21 45090682 179063 865394 30 686331 22 id:f:0.994209 kc:f:0.315678
ptg000145l 2623179 0 2623179 + chr21 45090682 1170940 3794119 24 2623179 19 id:f:0.98662 kc:f:0.182862
ptg000151l 555938 0 555938 + chr21 45090682 2499197 3055135 24 555938 19 id:f:0.98662 kc:f:0.900919
ptg000172l 1014256 0 1014256 + chr21 45090682 2682656 3696912 27 1014256 20 id:f:0.990289 kc:f:0.642754
ptg000215l 318592 0 318592 + chr21 45090682 5420070 5738662 38 318592 29 id:f:0.998634 kc:f:0.145981
ptg000025l 2865536 0 2865536 + chr21 45090682 7220950 10086486 21 2865536 17 id:f:0.981403 kc:f:0.455276
ptg000176l 584981 0 584981 - chr21 45090682 8381941 8966922 16 584981 16 id:f:0.971897 kc:f:0.557647
ptg000176l 584981 0 584981 + chr21 45090682 8381941 8966922 16 584981 16 id:f:0.971897 kc:f:0.557647
ptg000176l 584981 0 584981 + chr21 45090682 8381941 8966922 16 584981 16 id:f:0.971897 kc:f:0.557647
ptg000176l 584981 0 584981 + chr21 45090682 8381941 8966922 16 584981 16 id:f:0.971897 kc:f:0.557647
ptg000125l 2794119 0 2794119 + chr21 45090682 9374562 12168681 33 2794119 23 id:f:0.995431 kc:f:0.500231
ptg000293l 206332 0 206332 + chr21 45090682 11091705 11298037 21 206332 17 id:f:0.980549 kc:f:0.0194184
ptg000292l 229265 0 229265 + chr21 45090682 11199120 11428385 25 229265 19 id:f:0.986286 kc:f:0.0644872
ptg000033l 27686851 0 27686851 + chr21 45090682 24229362 45090681 40 27686851 255 id:f:1 kc:f:1.00259
ptg000115l 3037875 0 3037875 + chr21 45090682 39989460 43027335 37 3037875 27 id:f:0.997911 kc:f:0.951376
ptg000147l 1146249 0 1146249 + chr21 45090682 42183625 43329874 38 1146249 29 id:f:0.998634 kc:f:1.11024
ptg000047l 2315813 0 2315813 + chr21 45090682 43904919 45090681 38 2315813 29 id:f:0.998634 kc:f:1.22872
I used these commands:
mashmap --perc_identity 95 --noSplit -r hs1.fa.gz -q hifiasm.bp.unified.fa --threads 32 -o test1.mashmap
gunzip hs1.fa.gz
mashmap --perc_identity 95 --noSplit -r hs1.fa -q hifiasm.bp.unified.fa --threads 32 -o test.mashmap
Any ideas what could trigger such a behavior?
Thanks, Aaron