100% skipped reads
Dear developer
All of my reads were skipped after using modkit. Here are my steps.
-
The data were re-basecalled (using Dorado) in SUP mode $ data/dorado-0.9.1-linux-x64/bin/dorado basecaller --device cuda:all --no-trim --modified-bases-models /mnt/DataStore/Tools/dorado-0.9.1_profiles/V5/[email protected]_4mC_5mC@v3 /mnt/DataStore/Tools/dorado-0.9.1_profiles/V5/[email protected] /mnt/DataStore/all_3_first_run > 4mC_5mC_SUP
-
The reads below Q 10 were discarded, and high-quality reads barcoded using Dorado demux.
-
Reads from each barcode were mapped against the reference genome using minimap2.
-
Bam files were sorted and indexed using samtools.
-
modkit was used on the indexed BAM file: $modkit pilup B01.sorted.bam B01.bed --log-filepath pileup.log
calculated chunk size: 6, interval size 100000, processing 600000 positions concurrently attempting to sample 10042 reads Done, processed 0 rows. Processed ~0 reads and skipped ~269075 reads.
And
$ samtools view -F 4 B01.sorted.bam | wc -l 439548
And
$ samtools view B01.sorted.bam | grep -m 1 -E '\sMM:Z:' || echo "no MM tags found" no MM tags found
By looking at the log file, I think the issue is MM tags.
[modkit-core/src/reads_sampler/mod.rs::126][2025-09-07 01:46:49][DEBUG] sampled 0 records [modkit-core/src/interval_chunks.rs::531][2025-09-07 01:46:49][DEBUG] there are 1958 contig(s) to work on (1958 parts) [modkit-core/src/read_cache.rs::329][2025-09-07 01:46:49][DEBUG] 9d21257a-e035-436c-965d-e23f77e0838e: MM-tag-missing [modkit-core/src/read_cache.rs::329][2025-09-07 01:46:49][DEBUG] f7f4e4b5-8917-4391-b9f9-9b253463250f: MM-tag-missing [modkit-core/src/read_cache.rs::329][2025-09-07 01:46:49][DEBUG] 6a79da85-de55-4b01-996b-4eb8b1115697: MM-tag-missing
I suppose the issue is related to my basecalling, which I followed here https://github.com/nanoporetech/dorado/issues/1302#issue-2955430954
I got the same issue when I used the Dorado aligner too (instead of Minimap2).
Do you have any idea how I can overcome this issue?
Thanks for your help
Update
I noticed that ($ samtools view B01.sorted.bam | grep -m 1 -E '\sMM:Z:' || echo "no MM tags found"), my raw reads before de-multiplexing using Durado, do contain modified bases. However, the demux function will drop those modified bases. I used the Rapid Barcoding Kit for my sequencing, and de-multiplexing is quite important for me. Does it mean that de-multiplexing is not compatible with modified base calling?
Alternatively, can I run modkit pilup on my raw data before de-multiplexing, and then from the bed file, only pick reads whose quality is over 10 (Q is reported in the BAM file from Dorado)? If so, then I can probably collect reads for each barcode from the bed file; I just need to match the read ID with the one that was demultiplexed during the process.
Appreciate your comment.
Hello @Hedi65, sorry about the delay.
However, the demux function will drop those modified bases. I used the Rapid Barcoding Kit for my sequencing, and de-multiplexing is quite important for me. Does it mean that de-multiplexing is not compatible with modified base calling?
Barcoding is certainly compatible with modified base calling.
I don't think that dorado demux will discard base modification tags. Here is a way to check what's going on: after each step, run modkit modbam check-tags $bam --head 200 this command will report out how many reads have errors and exit non-0 if so. If the demuxed BAMs are missing tags (when the input BAM had them), please bring that up with the dorado team since these tags should be preserved. You could use modkit repair to put the tags back (docs).
If I had to guess, I bet you're loosing the tags at the minimap step, could you try using dorado aligner instead?
Alternatively, can I run modkit pilup on my raw data before de-multiplexing, and then from the bed file, only pick reads whose quality is over 10 (Q is reported in the BAM file from Dorado)?
Pileup doesn't report read-level information so you can't demux the bedMethyl like this. I'd strongly recommend figuring out which step is loosing the tags. If it's the demux step - that should be fixed in dorado and can be patched over with modkit repair. If you're loosing the tags after alignment with minimap, use Dorado aligner instead.
Thanks for the reply
Here is the outcome after demux
/mnt/DataStore/Processed_data/User/Bir/dist_modkit_v0.5.0_5120ef7/modkit modbam check-tags /mnt/DataStore/Processed_data/User/Bir/Barcoded_4mc_5mc_bam_not_filtered/unknown_run_id_SQK-RBK114-24_barcode01.bam --head 200
checking tags on first 200 reads input modBAM contains 200 (100.00%) failed records num PASS records: 0 (0.00%) num records: 200 errors: +----------------+-------+--------+ | error | count | pct | +----------------+-------+--------+ | MM-tag-missing | 200 | 100.00 | | total | 200 | 100 | +----------------+-------+--------+
modified bases: +--------+--------------+----------+------+ | strand | primary_base | mod_code | mode | +--------+--------------+----------+------+ +--------+--------------+----------+------+
Here is the outcome before demux
/mnt/DataStore/Processed_data/User/Bir/dist_modkit_v0.5.0_5120ef7/modkit modbam check-tags /mnt/DataStore/Processed_data/User/Bir/BAM_fastq_from_Dorado/4mC_5mC_SUP.bam --head 200
checking tags on first 200 reads no errors num PASS records: 200 (100.00%) num records: 200 valid record tag headers: +------------+-------+ | tag_header | count | +------------+-------+ | C+21839. | 200 | | C+m. | 200 | +------------+-------+
modified bases: +--------+--------------+----------+------+ | strand | primary_base | mod_code | mode | +--------+--------------+----------+------+ | + | C | 21839 | . | | + | C | m | . | +--------+--------------+----------+------+
I just followed what Dorado demux suggested; I even enabled the no-trim option as well as no filtration (quality or length) at the demux level.
I passed the basecalled BAM file before demux to the dorado aligner, and then I used modkit Pileup . It worked, and a 130 GB bed file was generated. Yet at this moment, we can't see which reads belong to which barcode.
Hi @Hedi65,
I'm not able to reproduce mod tags being stripped by dorado demux. Using the test data from the dorado repo, and using various versions from 0.9.1 to 1.1.1:
dorado basecaller --no-trim hac,5mC_5hmC \
tests/data/barcode_demux/read_group_test/SQK-RBK114_BC01_BC04_unclassified.pod5 > calls.bam
dorado demux --kit-name SQK-RBK114-96 -o demuxed calls.bam
samtools view demux-091/0d85015e-6a4e-400c-a80f-c187c65a6d03_SQK-RBK114-96_barcode01.bam | grep -m 1 -E '\sMM:Z:' | cut -f 1
gives
1311ecda-0649-46ad-988d-307a5f4e3bd6
i.e. it's found a read with the MM tags present.
Can you please provide your exact commands for each step?
Hello @Hedi65, sorry about the delay.
However, the demux function will drop those modified bases. I used the Rapid Barcoding Kit for my sequencing, and de-multiplexing is quite important for me. Does it mean that de-multiplexing is not compatible with modified base calling?
Barcoding is certainly compatible with modified base calling.
I don't think that
dorado demuxwill discard base modification tags. Here is a way to check what's going on: after each step, runmodkit modbam check-tags $bam --head 200this command will report out how many reads have errors and exit non-0 if so. If the demuxed BAMs are missing tags (when the input BAM had them), please bring that up with thedoradoteam since these tags should be preserved. You could usemodkit repairto put the tags back (docs).If I had to guess, I bet you're loosing the tags at the minimap step, could you try using
dorado alignerinstead?Alternatively, can I run modkit pilup on my raw data before de-multiplexing, and then from the bed file, only pick reads whose quality is over 10 (Q is reported in the BAM file from Dorado)?
Pileup doesn't report read-level information so you can't demux the bedMethyl like this. I'd strongly recommend figuring out which step is loosing the tags. If it's the demux step - that should be fixed in dorado and can be patched over with
modkit repair. If you're loosing the tags after alignment with minimap, use Dorado aligner instead.
Thanks for your reply. My data is coming from 3 different runs; something happened during the merging of the BAM files from other runs. After going through all the steps, I can confirm that modified bases are also detected after demux.
Hi @Hedi65,
I'm not able to reproduce mod tags being stripped by dorado demux. Using the test data from the dorado repo, and using various versions from 0.9.1 to 1.1.1:
dorado basecaller --no-trim hac,5mC_5hmC \ tests/data/barcode_demux/read_group_test/SQK-RBK114_BC01_BC04_unclassified.pod5 > calls.bam dorado demux --kit-name SQK-RBK114-96 -o demuxed calls.bam samtools view demux-091/0d85015e-6a4e-400c-a80f-c187c65a6d03_SQK-RBK114-96_barcode01.bam | grep -m 1 -E '\sMM:Z:' | cut -f 1gives
1311ecda-0649-46ad-988d-307a5f4e3bd6i.e. it's found a read with the MM tags present.
Can you please provide your exact commands for each step?
basecalling
/dorado-0.9.1-linux-x64/bin/dorado basecaller --device cuda:all --no-trim --min-qscore 10 --modified-bases-models /path/to/modified/basecalling/profile /path/to/basecalling/profile /mnt/DataStore/Raw_data/pod5_all_3_first_run > 4mC_5mC_SUP
barcoding
/dorado-0.9.1-linux-x64/bin/dorado demux --kit-name SQK-RBK114-24 --output-dir /mnt/DataStore/Processed_data/4mC_5mC_SUP_barcoded /mnt/DataStore/Processed_data/4mC_5mC_SUP.bam --no-trim
Thanks for your reply. My data is coming from 3 different runs; something happened during the merging of the BAM files from other runs. After going through all the steps, I can confirm that modified bases are also detected after demux.
Can you confirm that the tags are present before and after each dorado command - please provide every command you are running, and check the state of the tags at each step. If the tags are present after demux then the most likely thing I can see is that the tags are stripped due to hard-clipping during alignment. In this case, you can turn on soft-clipping with the --mm2-opts -Y parameter to dorado aligner (or just -Y to minimap2).
Thanks for your reply. My data is coming from 3 different runs; something happened during the merging of the BAM files from other runs. After going through all the steps, I can confirm that modified bases are also detected after demux.
Can you confirm that the tags are present before and after each dorado command - please provide every command you are running, and check the state of the tags at each step. If the tags are present after demux then the most likely thing I can see is that the tags are stripped due to hard-clipping during alignment. In this case, you can turn on soft-clipping with the
--mm2-opts -Yparameter todorado aligner(or just-Yto minimap2).
Thanks for your reply. The basecalling and barcoding commands are mentioned above. Bam files corresponding to the same esample across different runs were merged with samtools
$ samtools merge -@ 8 "$out" $files
And each merged BAM file was mapped against the bovine reference genome using the Dorado aligner.
$ /dorado aligner /mnt/DataStore/Bovine.fa /mnt/DataStore/demux_4mC_5mC_Q10_1/merged_barcode01.bam --emit-summary -t 20 -o /mnt/DataStore/Processed_data/4mc_5mc_mapping_against_ref_B01
checking tags right after basecalling
$ /mnt/DataStore/A_dist_modkit_v0.5.1_8fa79e3/modkit modbam check-tags /mnt/DataStore/Basecalled/4mC_5mC_SUP_Q10 --head 200
checking tags on first 200 reads no errors num PASS records: 200 (100.00%) num records: 200 valid record tag headers: +------------+-------+ | tag_header | count | +------------+-------+ | C+21839. | 200 | | C+m. | 200 | +------------+-------+
checking tags after merging the barcoded BAM files
$ /mnt/DataStore/A_dist_modkit_v0.5.1_8fa79e3/modkit modbam check-tags /mnt/DataStore/demux_4mC_5mC_Q10_1/merged_barcode01.bam --head 200
checking tags on first 200 reads no errors num PASS records: 200 (100.00%) num records: 200 valid record tag headers: +------------+-------+ | tag_header | count | +------------+-------+ | C+21839. | 200 | | C+m. | 200 | +------------+-------+
**checking tags after mapping the merged BAM file against the ref genome **
$ /mnt/DataStore/A_dist_modkit_v0.5.1_8fa79e3/modkit modbam check-tags /mnt/DataStore/4mc_5mc_mapping_against_ref_B01/merged_barcode01.bam --head 200
checking tags on first 200 reads no errors num PASS records: 200 (100.00%) num records: 200 valid record tag headers: +------------+-------+ | tag_header | count | +------------+-------+ | C+21839. | 200 | | C+m. | 200 | +------------+-------+
modified bases: +--------+--------------+----------+------+ | strand | primary_base | mod_code | mode | +--------+--------------+----------+------+ | + | C | 21839 | . | | + | C | m | . | +--------+--------------+----------+------+
Hi @Hedi65,
So, to confirm:
- dorado basecaller 🆗
- dorado demux 🆗
- samtools merge 🆗
- dorado aligner ❌ ?
Your current dorado aligner command will strip modbase tags as it is hard-clipping. This command needs the --mm2-opts -Y flags to turn on soft-clipping as mentioned above:
/dorado aligner \
/mnt/DataStore/Bovine.fa \
/mnt/DataStore/demux_4mC_5mC_Q10_1/merged_barcode01.bam
--emit-summary \
-t 20 \
--mm2-opts -Y \
-o /mnt/DataStore/Processed_data/4mc_5mc_mapping_against_ref_B01
Hi @Hedi65,
So, to confirm:
- dorado basecaller 🆗
- dorado demux 🆗
- samtools merge 🆗
- dorado aligner ❌ ?
Your current
dorado alignercommand will strip modbase tags as it is hard-clipping. This command needs the--mm2-opts -Yflags to turn on soft-clipping as mentioned above:/dorado aligner \ /mnt/DataStore/Bovine.fa \ /mnt/DataStore/demux_4mC_5mC_Q10_1/merged_barcode01.bam --emit-summary \ -t 20 \ --mm2-opts -Y \ -o /mnt/DataStore/Processed_data/4mc_5mc_mapping_against_ref_B01
thanks again
After enabling the --mm2-opts -Y flag, I can confirm that tags are present, although they are also present without the flag.
$ /mnt/DataStore/A_dist_modkit_v0.5.1_8fa79e3/modkit modbam check-tags /mnt/DataStore/Processed_data/4mc_5mc_mapping_against_ref_B01/merged_barcode01.bam --head 200
checking tags on first 200 reads no errors num PASS records: 200 (100.00%) num records: 200 valid record tag headers: +------------+-------+ | tag_header | count | +------------+-------+ | C+21839. | 200 | | C+m. | 200 | +------------+-------+
modified bases: +--------+--------------+----------+------+ | strand | primary_base | mod_code | mode | +--------+--------------+----------+------+ | + | C | 21839 | . | | + | C | m | . | +--------+--------------+----------+------+
Hi @Hedi65,
I don't think I'm clear on what the problem is then - your original issue said that you were seeing MM-tag-missing messages, but we've now confirmed that the tags are in fact present.
If the dorado commands are correctly generating and preserving the tags then there is no dorado issue and I'm going to hand this back to @ArtRand to determine what the problem is on the modkit side.
Thank you for being so helpful; I appreciate it.
I'm pretty sure that the problem originated from merging the BAM files. One of the students used cat to merge the BAM instead of samtools.