Untrimmed adapters following dorado demultiplexing
Issue Report
Please describe the issue:
I am getting in touch to ask for details on Dorado's barcode trimming efficiency. In some of my samples I have found that after demultiplexing, porechop_abi still identifies reads with native adapter sequences (e.g. 6% with front adapters and 3% with reverse adapters). I have never found that porechop_abi identifies the wrong adapter, which gives me some confidence that it's inferring correctly.
Please provide a clear and concise description of the issue you are seeing and the result you expect.
The issue is described above. I would expect porechop_abi to identify no reads with adapters following demultiplexing. I am wondering whether it is expected to see some reads with untrimmed adapters, and if so, whether I can do anything to reduce the proportion of reads that are not properly trimmed.
Steps to reproduce the issue:
Please list any steps to reproduce the issue.
Run environment:
- Dorado version: 7.3.11
- Dorado command: basecalling as part of MinKNOW (trim barccodes, barcodes both ends and mid-read barcode filtering)
- Operating system: Windows
- Hardware (CPUs, Memory, GPUs):
- Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
- Source data location (on device or networked drive - NFS, etc.): on device drive
- Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): FLO-PRO114M, SQK-NBD114-96
- Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue): I'm unsure how to generate a pod5 subset, but I'll be happy to look into this if that would be of help
Logs
- Please provide output trace of dorado (run dorado with -v, or -vv on a small subset)