modkit icon indicating copy to clipboard operation
modkit copied to clipboard

MN tag length and seq length don't match for 99.9% of reads

Open ddubocan opened this issue 1 year ago • 3 comments

Running modkit(v0.2.5) directly on the output of dorado v0.5.1:

modkit adjust-mods --log-filepath log_modkit_adjust -t 10 --edge-filter 10,10 barcode12.bam barcode12.adjust.bam

only 30 out of 100,000 reads pass, the remaining reads all fail. When I check the log file:

[src/command_utils.rs::259][2024-02-13 17:05:00][INFO] filtering out base modification calls 10 bases from the start and 10 bases from the end of each read
[src/adjust.rs::145][2024-02-13 17:05:00][DEBUG] read adc5febc-2870-42c5-91b9-560c529a1498 failed, MN tag length 8720 and seq length 8630 don't match
[src/adjust.rs::145][2024-02-13 17:05:00][DEBUG] read 3e20075d-60bf-4499-ac68-6a44de8864c5 failed, MN tag length 9123 and seq length 9083 don't match
[src/adjust.rs::145][2024-02-13 17:05:00][DEBUG] read 23407d2c-2e57-4130-8ad0-ef2b4d18b81b failed, MN tag length 961 and seq length 922 don't match
[src/adjust.rs::145][2024-02-13 17:05:00][DEBUG] read 30660983-6a76-4d63-97d3-9eec6de3e74e failed, MN tag length 3841 and seq length 3740 don't match

I do not have this occur when I use modkit (v0.13) -- nearly all reads get processed.

ddubocan avatar Feb 14 '24 01:02 ddubocan

Hello @ddubocan,

The problem comes from Dorado 0.5.1 would not set this value correctly. Normally I would say that you could manually fix the MN tag with something like pysam, but the barcoding itself was off (see the changelog). My collogues have told me that the best thing to do is re-basecall with Dorado 0.5.2 where the issue is fixed. Please let me know if this doesn't fix the error.

A

ArtRand avatar Feb 14 '24 14:02 ArtRand

Thank you so much for the quick response! I will update to the most recent build of dorado and try again.

ddubocan avatar Feb 14 '24 15:02 ddubocan

@ddubocan any luck?

ArtRand avatar Feb 16 '24 03:02 ArtRand