Meanings of fields of output paf file.
Hi, firstly i'd like to thank you for developing this tool!
I am a little confused about the meaning of fields in the PAF format output by the software. According to paf , column 10 should represent the number of matched bases, while column 11 indicates the total alignment length. However, in my result, the number of matches is significantly smaller than the total length. Could you please clarify the meaning of these two columns? I am using mashmap 3.1.3, and the command line is mashmap -q ptg004078l_revcomp.fa -r DWv2.1_chr2B.fa -t 120 -o DW/4078vsDW.mashmap.paf
My result looks like this:
ptg004078l 4455011 0 65000 + LT934114.1 790338525 18792651 18856944 79 65000 18 id:f:0.983043 kc:f:1.0035
ptg004078l 4455011 65000 90000 + LT934114.1 790338525 18877319 18899784 47 25000 15 id:f:0.968634 kc:f:0.982086
ptg004078l 4455011 90000 125000 + LT934114.1 790338525 18918968 18953818 59 35000 18 id:f:0.982738 kc:f:1.00158
ptg004078l 4455011 125000 130000 + LT934114.1 790338525 18980649 18985649 70 5000 17 id:f:0.981662 kc:f:0.928003
ptg004078l 4455011 130000 630000 + LT934114.1 790338525 18995211 19496613 114 501402 28 id:f:0.998313 kc:f:0.96713
ptg004078l 4455011 630000 1160000 + LT934114.1 790338525 19504070 20036872 127 532802 31 id:f:0.999273 kc:f:0.995263
ptg004078l 4455011 1165000 1175000 + LT934114.1 790338525 20036919 20044611 18 10000 14 id:f:0.961334 kc:f:0.954291
ptg004078l 4455011 1175000 1190000 + LT934114.1 790338525 20024503 20038160 107 15000 18 id:f:0.983895 kc:f:0.926864
ptg004078l 4455011 1190000 1195000 + LT934114.1 790338525 447648420 447653420 2 5000 8 id:f:0.831914 kc:f:0.954308
ptg004078l 4455011 1195000 1205000 + LT934114.1 790338525 20037554 20044809 41 10000 16 id:f:0.974318 kc:f:0.985881
Furthermore, what is the meaning of the two additional tags in columns 13 and 14? Does the ‘id’ tag in column 13 refer to identity of the alignment?
As a beginner, if I have misunderstood something obvious, please excuse my lack of knowledge. Thank you!
Hi @Dentalium, thanks for asking! Since MashMap is an "approximate" mapping method, it does not actually align any bases. When segments are not merged, the 10th column tracks how many sketched k-mers are shared between the reference and query for a segment mapping. These numbers are not updated during the merging step, though, so for the most part you can disregard them.
The id tag is the estimated identity of a mapping. The kc tag is an estimate of the "k-mer complexity." A number closer to 0.0 would mean that the mapped region has many repeated k-mers (e.g. a highly repetitive region).
Please let me know if you have any questions, thanks!
Hi @Dentalium, thanks for asking! Since MashMap is an "approximate" mapping method, it does not actually align any bases. When segments are not merged, the 10th column tracks how many sketched k-mers are shared between the reference and query for a segment mapping. These numbers are not updated during the merging step, though, so for the most part you can disregard them.
The
idtag is the estimated identity of a mapping. Thekctag is an estimate of the "k-mer complexity." A number closer to 0.0 would mean that the mapped region has many repeated k-mers (e.g. a highly repetitive region).Please let me know if you have any questions, thanks!
I see. Thank you!