Restrict fastq output of demux to only samples included in the sample sheet
Issue Report
Please describe the issue:
Hi,
in version 0.7.2, dorado demux outputs all barcodes in the given kit in fastq format (--emit-fastq) even if the barcode is not listed in the sample sheet. Samples in the sample sheet are named with the alias name (which is perfect) while other will be named something like SQK-NBD114-24_barcode12.fastq, SQK-NBD114-24_barcode13.fastq etc.
Is it possible to limit the output to only the samples included in the sample sheet and suppress others. I think this was like that in the older versions of dorado. In fact the samples listed in the samplesheet are the most important (target) ones while the others have limited values, I mean in routine work
Thanks in advance
Please provide a clear and concise description of the issue you are seeing and the result you expect.
Steps to reproduce the issue:
Please list any steps to reproduce the issue.
Run environment:
- Dorado version: 0.7.2
- Dorado command:
dorado-0.5 demux calls.bam --sample-sheet sample_sheet.csv --output-dir <outdir> --kit-name SQK-NBD114-24 --emit-fastq - Operating system:
- Hardware (CPUs, Memory, GPUs):
- Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance):
- Source data location (on device or networked drive - NFS, etc.):
- Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB):
- Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):
Logs
- Please provide output trace of dorado (run dorado with -v, or -vv on a small subset)
Hi @MostafaYA, we'll look into this as a feature and get back to you.
Raised again in https://github.com/nanoporetech/dorado/issues/1162 - what is being requested is the intended behaviour, so I've changed this from enchancement to bug.
Thanks for your patience. This issue is resolved in dorado 0.9.0.