modkit icon indicating copy to clipboard operation
modkit copied to clipboard

motifs column missing in extract calls when --cpg is used

Open billytcl opened this issue 2 months ago • 4 comments

In the modkit documentation for extract calls, this is the column definition for "motifs"

24 motifs comma-separated list of reference motifs matching at this position, only present when --motifs or --cpg is used str

My command was (--cpg is right before --log-filepath):

[modkit-logging/src/lib.rs::69][2025-10-13 20:16:01][DEBUG] command line: modkit extract calls --force --threads 10 --interval-size 10000 --queue-size 100 --mapped-only --pass-only --reference ref.fa **--cpg** --log-filepath file.log file.sorted.bam file.modkit.calls.tsv

and here is the first couple of lines of the calls.tsv:

read_id forward_read_position   ref_position    chrom   mod_strand      ref_strand      ref_mod_strand  fw_soft_clipped_start   fw_soft_clipped_end     alignment_start alignment_end   read_length     call_prob       call_code       base_qual       ref_kmer    query_kmer      canonical_base  modified_primary_base   fail    inferred        within_alignment        flag
ab10c180-eae2-43df-bda6-baa0676140c3    59      21947   chr1    +       +       +       59      18      21947   22491   615     0.796875        -       7       CACGT   CTCGT   C       C       false   false   true    0
ab10c180-eae2-43df-bda6-baa0676140c3    134     22024   chr1    +       +       +       59      18      21947   22491   615     0.90625 -       8       CTCGC   CTCGC   C       C       false   false   true    0
ab10c180-eae2-43df-bda6-baa0676140c3    354     22245   chr1    +       +       +       59      18      21947   22491   615     0.82421875      -       18      CACGA   CACGA   C       C       false   false   true    0
ab10c180-eae2-43df-bda6-baa0676140c3    518     22409   chr1    +       +       +       59      18      21947   22491   615     0.9746094       h       15      GGCGT   AGCGT   C       C       false   false   true    0
ab10c180-eae2-43df-bda6-baa0676140c3    534     22427   chr1    +       +       +       59      18      21947   22491   615     0.9707031       h       7       ACCGA   TGCGA   C       C       false   false   true    0
ab10c180-eae2-43df-bda6-baa0676140c3    567     22461   chr1    +       +       +       59      18      21947   22491   615     0.9941406       m       12      CACGC   CACGC   C       C       false   false   true    0
ab10c180-eae2-43df-bda6-baa0676140c3    569     22463   chr1    +       +       +       59      18      21947   22491   615     0.9394531       m       14      CGCGG   CGCGG   C       C       false   false   true    0
e5085a3c-83ff-42c8-98e4-4bcb954f020a    106     22348   chr1    +       +       +       33      14      22272   22751   509     0.8769531       m       17      TCCGA   TCCGA   C       C       false   false   true    0
e5085a3c-83ff-42c8-98e4-4bcb954f020a    162     22409   chr1    +       +       +       33      14      22272   22751   509     0.9707031       m       12      GGCGT   GGCGT   C       C       false   false   true    0

Seems like it should output a motifs column?

billytcl avatar Oct 18 '25 22:10 billytcl

Hello @billytcl,

This column will be included when more than one motif is used. So for example if you added --cpg --motif CGCG 0 you will get this column indicating which motif(s) match this position. I'll update the documentation, thanks for noticing.

ArtRand avatar Oct 21 '25 21:10 ArtRand

Hii,

We are facing similar issue even tho we specified multiple motifs the last column motif was not in the output. This is the command we used:

modkit extract calls --reference $ref -t 8 --log-filepath $out"/extract_calls.log" --motif CG 0  --motif CHG 0  --motif CHH 0 --mapped-only --region Chr1 --include-bed $bed_file $bam $out"/test_extract_calls_contexts.tsv" 

and this is a preview of the output:

read_id	forward_read_position	ref_position	chrom	mod_strand	ref_strand	ref_mod_strand	fw_soft_clipped_start	fw_soft_clipped_end	read_length	call_prob	call_code	base_qual	ref_kmer	query_kmer	canonical_base	modified_primary_base	fail	inferred	within_alignment	flag
0dab652e-1f72-4356-9e50-408651fc30cc	18381	394574	Chr1	+	+	+	0	0	27968	1	-	50	GTCTC	GTCTC	C	C	false	true	true	0
0dab652e-1f72-4356-9e50-408651fc30cc	18383	394576	Chr1	+	+	+	0	0	27968	1	-	50	CTCTT	CTCTT	C	C	false	true	true	0
0dab652e-1f72-4356-9e50-408651fc30cc	18386	394579	Chr1	+	+	+	0	0	27968	1	-	41	TTCTT	TTCTT	C	C	false	true	true	0
0dab652e-1f72-4356-9e50-408651fc30cc	18410	394603	Chr1	+	+	+	0	0	27968	1	-	35	GACTA	GACTA	C	C	false	true	true	0
0dab652e-1f72-4356-9e50-408651fc30cc	18431	394626	Chr1	+	+	+	0	0	27968	1	-	39	GACAT	GACAT	C	C	false	true	true	0
1f1c144e-01d7-4980-a8bc-7d98f4797f69	16165	394574	Chr1	+	+	+	0	0	17942	1	-	50	GTCTC	GTCTC	C	C	false	true	true	0
1f1c144e-01d7-4980-a8bc-7d98f4797f69	16167	394576	Chr1	+	+	+	0	0	17942	1	-	50	CTCTT	CTCTT	C	C	false	true	true	0
1f1c144e-01d7-4980-a8bc-7d98f4797f69	16170	394579	Chr1	+	+	+	0	0	17942	1	-	50	TTCTT	TTCTT	C	C	false	true	true	0
1f1c144e-01d7-4980-a8bc-7d98f4797f69	16194	394603	Chr1	+	+	+	0	0	17942	1	-	37	GACTA	GACTA	C	C	false	true	true	0
1f1c144e-01d7-4980-a8bc-7d98f4797f69	16221	394626	Chr1	+	+	+	0	0	17942	1	-	43	GACAT	GACAT	C	C	false	true	true	0
e848b05b-94c0-4b1c-bc34-7a2ad60faf14	15815	394574	Chr1	+	+	+	0	0	24395	1	-	15	GTCTC	GTCTC	C	C	false	true	true	0
e848b05b-94c0-4b1c-bc34-7a2ad60faf14	15817	394576	Chr1	+	+	+	0	0	24395	1	-	18	CTCTT	CTCTT	C	C	false	true	true	0
e848b05b-94c0-4b1c-bc34-7a2ad60faf14	15820	394579	Chr1	+	+	+	0	0	24395	1	-	10	TTCTT	TTCTT	C	C	false	true	true	0
e848b05b-94c0-4b1c-bc34-7a2ad60faf14	15844	394603	Chr1	+	+	+	0	0	24395	0.46484375	-	14	GACTA	GACTG	C	C	true	false	true	0
e848b05b-94c0-4b1c-bc34-7a2ad60faf14	15860	394626	Chr1	+	+	+	0	0	24395	1	-	7	GACAT	GACAT	C	C	false	true	true	0

jkh00 avatar Oct 30 '25 10:10 jkh00

@jkh00 could you tell me the output of modkit --version?

ArtRand avatar Oct 31 '25 23:10 ArtRand

hi, it was the older version indeed! (0.4.1)

we updated to the new version (0.5.1) and the column is showing now 👍

jkh00 avatar Nov 10 '25 13:11 jkh00