dorado
dorado copied to clipboard
[improvement] dorado trim says 'basecalled' instead of amount of bases trimmed
Hi, I was just processing a bunch of reads with dorado trim (0.7.3) and noticed that it keeps saying:
[info] > Simplex reads basecalled: 456331
And if I turn verbose on:
[debug] Total reads processed: 456331
[info] > Simplex reads basecalled: 456331
That doesn't really make sense for the trimming module. I'd be much more interested if it could tell me something like:
x bp barcodeXX trimmed
x bp adapter trimmed
Cheers, Johannes
Thanks for bringing this to our attention - we'll update this.
It is also still leaving lots of barcodes on the reads.... at this point i'm probably just going to hard trim 100 bp off each end instead of relying on dorado trim.
I just had another read of the main page, am I understanding it correctly that dorado trim only trims adapters and primers if I feed i an already demultiplexed fastq file, and it isn't checking for barcodes?
If so, can this please added?
@JWDebler,
The logging message has been updated in dorado 0.9.5, just released.
dorado trim does not remove barcodes - to do this would require detecting barcodes. If your barcodes are still present you can just run the file through dorado demux and reclassify them to apply trimming.
Thaks @malton-ont. To be honest, I have given up on dorado trim and am now just hard trimming all reads 75 bp from each end. Too often I ended up with adapter and barcodes in my assemblies. So what are you saying would be the correct order to trim adapters and barcodes then?
My order so far is:
- basecall all reads with
--kit-nameoption to classify - demultiplex reads with
--no-classify(as they are already classified) - trim.
Are you saying I should add another classification step after to trim the barcodes? Sounds weird to have a trimming feature that doesn't trim barcodes as well.
@JWDebler,
You shouldn't need trim in the setup you've described - the first step should trim barcodes (as you're classifying) as well as adapters and primers (even from unclassified reads). You should only need trim if you've a) not used any barcodes or b) want to remove the adapters/primers from unclassified reads that were called with the --no-trim flag.
# 1
dorado basecaller --kit-name <kit> ... > trimmed_calls.bam # all reads trimmed already
dorado demux --no-classify -o trimmed_demux # no further trimming required
# 2
dorado basecaller --kit-name <kit> --no-trim > untrimmed_calls.bam # reads not trimmed
dorado demux --kit-name <kit> untrimmed_calls.bam -o trimmed_demux # trims barcodes, including outboard adapters and primers
dorado trim trimmed_demux/<run_id>_unclassified.bam > trimmed_demux # trim adapters and primers from unclassified reads
# 3
dorado basecaller --no-trim > untrimmed_calls.bam # reads not trimmed, no barcodes!
dorado trim untrimmed_calls.bam > trimmed.bam # trim adapters and primers, no barcodes present
Note that in case 2 it's safe to reclassify since we didn't trim the barcodes in the first place. This is not the recommended workflow though, since it wastes time classifying twice - this case should really not bother with classifying during basecalling.
trim has never been intended to handle barcodes - it has only ever been intended to remove adapters and primers. We could probably extend it to look for the barcode of each read as stated in the BAM file BC tag, but this wouldn't work for FASTQ/FASTA input since those files don't store the barcode information.
Hmm, ok. My setup is slightly different.
- Basecall hac with classification
- Demux
- extract read IDs from demuxed reads
- Use IDs to separate pod5s files and sort them by channel
- Duplex call barcode separated pod5s sup (duplex calling doesn't trim simplex reads)
- Extract simplex and duplex reads
- Trim, but since at this point the reads still have Adapter and barcodes, dorado trim doesn't help, so I hard trim)
I guess since duplex is on the chopping block anyways I can't expect much more development in that direction.
Cheers anyways 😊
On Wed, 2 Apr 2025, 16:00 malton-ont, @.***> wrote:
@JWDebler https://github.com/JWDebler,
You shouldn't need trim in the setup you've described - the first step should trim barcodes (as you're classifying) as well as adapters and primers (even from unclassified reads). You should only need trim if you've a) not used any barcodes or b) want to remove the adapters/primers from unclassified reads that were called with the --no-trim flag.
1
dorado basecaller --kit-name
... > trimmed_calls.bam # all reads trimmed already dorado demux --no-classify -o trimmed_demux # no further trimming required 2
dorado basecaller --kit-name
--no-trim > untrimmed_calls.bam # reads not trimmed dorado demux --kit-name untrimmed_calls.bam -o trimmed_demux # trims barcodes, including outboard adapters and primers dorado trim trimmed_demux/<run_id>_unclassified.bam > trimmed_demux # trim adapters and primers from unclassified reads 3
dorado basecaller --no-trim > untrimmed_calls.bam # reads not trimmed, no barcodes! dorado trim untrimmed_calls.bam > trimmed.bam # trim adapters and primers, no barcodes present
Note that in case 2 it's safe to reclassify since we didn't trim the barcodes in the first place. This is not the recommended workflow though, since it wastes time classifying twice - this case should really not bother with classifying during basecalling.
trim has never been intended to handle barcodes - it has only ever been intended to remove adapters and primers. We could probably extend it to look for the barcode of each read as stated in the BAM file BC tag, but this wouldn't work for FASTQ/FASTA input since those files don't store the barcode information.
— Reply to this email directly, view it on GitHub https://github.com/nanoporetech/dorado/issues/988#issuecomment-2771668766, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBHB2TB5LOMKS2VFIKAUV32XOKJTAVCNFSM6AAAAAB2HHZUIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZRGY3DQNZWGY . You are receiving this because you were mentioned.Message ID: @.***> [image: malton-ont]malton-ont left a comment (nanoporetech/dorado#988) https://github.com/nanoporetech/dorado/issues/988#issuecomment-2771668766
@JWDebler https://github.com/JWDebler,
You shouldn't need trim in the setup you've described - the first step should trim barcodes (as you're classifying) as well as adapters and primers (even from unclassified reads). You should only need trim if you've a) not used any barcodes or b) want to remove the adapters/primers from unclassified reads that were called with the --no-trim flag.
1
dorado basecaller --kit-name
... > trimmed_calls.bam # all reads trimmed already dorado demux --no-classify -o trimmed_demux # no further trimming required 2
dorado basecaller --kit-name
--no-trim > untrimmed_calls.bam # reads not trimmed dorado demux --kit-name untrimmed_calls.bam -o trimmed_demux # trims barcodes, including outboard adapters and primers dorado trim trimmed_demux/<run_id>_unclassified.bam > trimmed_demux # trim adapters and primers from unclassified reads 3
dorado basecaller --no-trim > untrimmed_calls.bam # reads not trimmed, no barcodes! dorado trim untrimmed_calls.bam > trimmed.bam # trim adapters and primers, no barcodes present
Note that in case 2 it's safe to reclassify since we didn't trim the barcodes in the first place. This is not the recommended workflow though, since it wastes time classifying twice - this case should really not bother with classifying during basecalling.
trim has never been intended to handle barcodes - it has only ever been intended to remove adapters and primers. We could probably extend it to look for the barcode of each read as stated in the BAM file BC tag, but this wouldn't work for FASTQ/FASTA input since those files don't store the barcode information.
— Reply to this email directly, view it on GitHub https://github.com/nanoporetech/dorado/issues/988#issuecomment-2771668766, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBHB2TB5LOMKS2VFIKAUV32XOKJTAVCNFSM6AAAAAB2HHZUIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZRGY3DQNZWGY . You are receiving this because you were mentioned.Message ID: @.***>