dorado
dorado copied to clipboard
Barcode-both-ends does not use rear barcode
Issue Report
Please describe the issue:
I'm trying to use the flag --barcode-both-ends
. However, it seems that it runs all the reads into the unclassified.fastq. When I run dorado with the -vvv it looks like it's only checking both barcodes with the front barcode.
[2024-02-23 17:05:27.932] [trace] Check double ends: top bc BC05, bottom bc BC06
[2024-02-23 17:05:27.932] [trace] Scores: 0.8 BC05, 0.6 BC06, 0.5 BC01, 0.5 BC03, 0.4 BC02, 0.4 BC08, 0.3 BC04, 0.3 BC07,
[2024-02-23 17:05:27.932] [trace] BC: unclassified
[2024-02-23 17:05:28.032] [info] > Simplex reads basecalled: 1
[2024-02-23 17:05:28.032] [info] > 1 reads demuxed @ classifications/s: 1.000000e+01
[2024-02-23 17:05:28.032] [info] > finished barcode demuxing
However, I have the rear barcodes defined in my .toml like:
barcode1_pattern = "BC%02i"
barcode2_pattern = "RC%02i"
Isn't the RC suppose to be compared to the rear barcode? Why isn't this single read passing? When I run without the flag --barcode-both-ends
this read classified. What stops it from being classfied here.
Steps to reproduce the issue:
Please list any steps to reproduce the issue.
Run environment:
- Dorado version: 0.5.3+d9af343
- Dorado command: dorado demux -vvv -t 8 --barcode-both-ends --barcode-sequences barcode_sequences.fa --barcode-arrangement twist-custom.toml --emit-fastq --output-dir single_read_test ./single_read.fastq.gz
- Operating system: 22.04.1-Ubuntu
- Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance):
- Source data location (on device or networked drive - NFS, etc.): fastq
Logs
[2024-02-23 17:05:27.932] [trace] Check double ends: top bc BC05, bottom bc BC06 [2024-02-23 17:05:27.932] [trace] Scores: 0.8 BC05, 0.6 BC06, 0.5 BC01, 0.5 BC03, 0.4 BC02, 0.4 BC08, 0.3 BC04, 0.3 BC07, [2024-02-23 17:05:27.932] [trace] BC: unclassified [2024-02-23 17:05:28.032] [info] > Simplex reads basecalled: 1 [2024-02-23 17:05:28.032] [info] > 1 reads demuxed @ classifications/s: 1.000000e+01 [2024-02-23 17:05:28.032] [info] > finished barcode demuxing
Hi @Afollet,
The trace showing scores only for the front barcode is slightly misleading here - for double ended barcodes we do check against both, but the trace output only lists the name of the top barcode. The score listed here is the "best" score from top and bottom of both the forward and reverse alignments, but this is not the only metric used to determine a barcode match. Specifically for --barcode-both-ends
, there is an additional check that both the top and bottom scores are greater than min_hard_barcode_threshold
, which is presumably what is failing here since the read classifies without this.
If you could provide us with your barcode configuration and an example failing read, we can investigate further.
@malton-ont I can't share too many details here.
I would request that logging would be implemented for the rear barcode. It would make this far more easy to debug. Both barcodes are perfect matches in the single read I am trying them on. However, it fails. It does succeed when I lower the min_hard_barcode_threshold
. Does that imply that is something other than a perfect match?
Hi @Afollet. Yes, we can improve that logging in a future release. If you don't want to wait you can compile dorado for yourself and add it in around here (PRs welcome!)
Without running the reads myself I can't be 100% sure, but that sounds likely. Remember that we expect to search for the RC of the second barcode on the forward strand - make sure your config has this correct! See CustomBarcodes.md