Untrimmed adapters following dorado demultiplexing

Open ezherman opened this issue 1 year ago • 0 comments

Issue Report

Please describe the issue:

I am getting in touch to ask for details on Dorado's barcode trimming efficiency. In some of my samples I have found that after demultiplexing, porechop_abi still identifies reads with native adapter sequences (e.g. 6% with front adapters and 3% with reverse adapters). I have never found that porechop_abi identifies the wrong adapter, which gives me some confidence that it's inferring correctly.

Please provide a clear and concise description of the issue you are seeing and the result you expect. The issue is described above. I would expect porechop_abi to identify no reads with adapters following demultiplexing. I am wondering whether it is expected to see some reads with untrimmed adapters, and if so, whether I can do anything to reduce the proportion of reads that are not properly trimmed.

Steps to reproduce the issue:

Please list any steps to reproduce the issue.

Run environment:

Dorado version: 7.3.11
Dorado command: basecalling as part of MinKNOW (trim barccodes, barcodes both ends and mid-read barcode filtering)
Operating system: Windows
Hardware (CPUs, Memory, GPUs):
Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
Source data location (on device or networked drive - NFS, etc.): on device drive
Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): FLO-PRO114M, SQK-NBD114-96
Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue): I'm unsure how to generate a pod5 subset, but I'll be happy to look into this if that would be of help

Logs

Please provide output trace of dorado (run dorado with -v, or -vv on a small subset)

Aug 07 '24 15:08 ezherman