motifs column missing in extract calls when --cpg is used
In the modkit documentation for extract calls, this is the column definition for "motifs"
| 24 | motifs | comma-separated list of reference motifs matching at this position, only present when --motifs or --cpg is used | str |
|---|
My command was (--cpg is right before --log-filepath):
[modkit-logging/src/lib.rs::69][2025-10-13 20:16:01][DEBUG] command line: modkit extract calls --force --threads 10 --interval-size 10000 --queue-size 100 --mapped-only --pass-only --reference ref.fa **--cpg** --log-filepath file.log file.sorted.bam file.modkit.calls.tsv
and here is the first couple of lines of the calls.tsv:
read_id forward_read_position ref_position chrom mod_strand ref_strand ref_mod_strand fw_soft_clipped_start fw_soft_clipped_end alignment_start alignment_end read_length call_prob call_code base_qual ref_kmer query_kmer canonical_base modified_primary_base fail inferred within_alignment flag
ab10c180-eae2-43df-bda6-baa0676140c3 59 21947 chr1 + + + 59 18 21947 22491 615 0.796875 - 7 CACGT CTCGT C C false false true 0
ab10c180-eae2-43df-bda6-baa0676140c3 134 22024 chr1 + + + 59 18 21947 22491 615 0.90625 - 8 CTCGC CTCGC C C false false true 0
ab10c180-eae2-43df-bda6-baa0676140c3 354 22245 chr1 + + + 59 18 21947 22491 615 0.82421875 - 18 CACGA CACGA C C false false true 0
ab10c180-eae2-43df-bda6-baa0676140c3 518 22409 chr1 + + + 59 18 21947 22491 615 0.9746094 h 15 GGCGT AGCGT C C false false true 0
ab10c180-eae2-43df-bda6-baa0676140c3 534 22427 chr1 + + + 59 18 21947 22491 615 0.9707031 h 7 ACCGA TGCGA C C false false true 0
ab10c180-eae2-43df-bda6-baa0676140c3 567 22461 chr1 + + + 59 18 21947 22491 615 0.9941406 m 12 CACGC CACGC C C false false true 0
ab10c180-eae2-43df-bda6-baa0676140c3 569 22463 chr1 + + + 59 18 21947 22491 615 0.9394531 m 14 CGCGG CGCGG C C false false true 0
e5085a3c-83ff-42c8-98e4-4bcb954f020a 106 22348 chr1 + + + 33 14 22272 22751 509 0.8769531 m 17 TCCGA TCCGA C C false false true 0
e5085a3c-83ff-42c8-98e4-4bcb954f020a 162 22409 chr1 + + + 33 14 22272 22751 509 0.9707031 m 12 GGCGT GGCGT C C false false true 0
Seems like it should output a motifs column?
Hello @billytcl,
This column will be included when more than one motif is used. So for example if you added --cpg --motif CGCG 0 you will get this column indicating which motif(s) match this position. I'll update the documentation, thanks for noticing.
Hii,
We are facing similar issue even tho we specified multiple motifs the last column motif was not in the output.
This is the command we used:
modkit extract calls --reference $ref -t 8 --log-filepath $out"/extract_calls.log" --motif CG 0 --motif CHG 0 --motif CHH 0 --mapped-only --region Chr1 --include-bed $bed_file $bam $out"/test_extract_calls_contexts.tsv"
and this is a preview of the output:
read_id forward_read_position ref_position chrom mod_strand ref_strand ref_mod_strand fw_soft_clipped_start fw_soft_clipped_end read_length call_prob call_code base_qual ref_kmer query_kmer canonical_base modified_primary_base fail inferred within_alignment flag
0dab652e-1f72-4356-9e50-408651fc30cc 18381 394574 Chr1 + + + 0 0 27968 1 - 50 GTCTC GTCTC C C false true true 0
0dab652e-1f72-4356-9e50-408651fc30cc 18383 394576 Chr1 + + + 0 0 27968 1 - 50 CTCTT CTCTT C C false true true 0
0dab652e-1f72-4356-9e50-408651fc30cc 18386 394579 Chr1 + + + 0 0 27968 1 - 41 TTCTT TTCTT C C false true true 0
0dab652e-1f72-4356-9e50-408651fc30cc 18410 394603 Chr1 + + + 0 0 27968 1 - 35 GACTA GACTA C C false true true 0
0dab652e-1f72-4356-9e50-408651fc30cc 18431 394626 Chr1 + + + 0 0 27968 1 - 39 GACAT GACAT C C false true true 0
1f1c144e-01d7-4980-a8bc-7d98f4797f69 16165 394574 Chr1 + + + 0 0 17942 1 - 50 GTCTC GTCTC C C false true true 0
1f1c144e-01d7-4980-a8bc-7d98f4797f69 16167 394576 Chr1 + + + 0 0 17942 1 - 50 CTCTT CTCTT C C false true true 0
1f1c144e-01d7-4980-a8bc-7d98f4797f69 16170 394579 Chr1 + + + 0 0 17942 1 - 50 TTCTT TTCTT C C false true true 0
1f1c144e-01d7-4980-a8bc-7d98f4797f69 16194 394603 Chr1 + + + 0 0 17942 1 - 37 GACTA GACTA C C false true true 0
1f1c144e-01d7-4980-a8bc-7d98f4797f69 16221 394626 Chr1 + + + 0 0 17942 1 - 43 GACAT GACAT C C false true true 0
e848b05b-94c0-4b1c-bc34-7a2ad60faf14 15815 394574 Chr1 + + + 0 0 24395 1 - 15 GTCTC GTCTC C C false true true 0
e848b05b-94c0-4b1c-bc34-7a2ad60faf14 15817 394576 Chr1 + + + 0 0 24395 1 - 18 CTCTT CTCTT C C false true true 0
e848b05b-94c0-4b1c-bc34-7a2ad60faf14 15820 394579 Chr1 + + + 0 0 24395 1 - 10 TTCTT TTCTT C C false true true 0
e848b05b-94c0-4b1c-bc34-7a2ad60faf14 15844 394603 Chr1 + + + 0 0 24395 0.46484375 - 14 GACTA GACTG C C true false true 0
e848b05b-94c0-4b1c-bc34-7a2ad60faf14 15860 394626 Chr1 + + + 0 0 24395 1 - 7 GACAT GACAT C C false true true 0
@jkh00 could you tell me the output of modkit --version?
hi, it was the older version indeed! (0.4.1)
we updated to the new version (0.5.1) and the column is showing now 👍