dorado icon indicating copy to clipboard operation
dorado copied to clipboard

Empty values for columns "template_start" and "template_duration" in summary for barcoded reads

Open luigilamparelli opened this issue 1 year ago • 1 comments

Issue Report

Please describe the issue:

Hello, I've noticed that, after demultiplexing, some reads have missing values for the columns "template_start" and "template_duration" in the summary produced by dorado summary. The same read but before barcoding have the columns filled and I expect the same after barcoding and demultiplexing.

Steps to reproduce the issue:

  1. Basecalling without barcoding: dorado basecaller -vv [email protected] ${pod5} > ${bam}
  2. Demultiplexing: dorado demux -vv --output-dir ${out_dir}/ --kit-name SQK-NBD114-96 --emit-summary ${bam}
  3. Summary on barcoded reads: dorado summary ${out_dir}/${barcoded_bam} > summary_barcoded.txt
  4. Summary on the original not barcoded reads: dorado summary ${bam} > summary_not_barcoded.txt

Run environment:

  • Dorado version: 0.6.1+79b5da5

Logs

You will find attached the log files for the 3 dorado commands. In addition to these, I would like to share with you the tags of one of the reads which fails (0a0eb075-0cd0-41f6-9244-f9b5c4d17e61), both before and after barcoding. There is a difference in two tags:

  • ns:i:: this is 171575 in the not barcoded and 0 in the barcoded. The missing value may be caused by a division by 0; changing this to any other positive integer will make dorado summary write numbers in the above mentioned columns.
  • ts:i:: this is 0 in both files, but in the barcoded is moved to the end. I don't think this is relevant to this issue.

bam_tags_barcoded.txt bam_tags_not_barcoded.txt dorado_basecaller.log dorado_demux.log dorado_summary.log

Here are the summary files obtained: summary_barcoded.txt summary_not_barcoded.txt

Thanks for the help!

luigilamparelli avatar May 07 '24 12:05 luigilamparelli

Hi @luigilamparelli,

Thanks for raising this. This looks like an issue in calculating the new number of samples in the read after barcode trimming in the demux subcommand - as we're missing the move table at this point, we end up calculating the length as zero.

You can avoid this for now by performing the demux in line with basecalling (by providing the --kit-name parameter during basecalling) or by including the move table in the basecalling output (using the --emit-moves flag). Or you can skip trimming entirely with the --no-trim option.

We'll investigate the best way to resolve this properly for a future release.

malton-ont avatar May 08 '24 13:05 malton-ont

Hi @luigilamparelli - a fix for this has been released with dorado v0.7.0 and newer.

tijyojwad avatar Jun 05 '24 21:06 tijyojwad