Dorado aligner not working on input directory
Issue Report
Please describe the issue:
Hi, I am trying to re-align sequencing results from an adaptive sampling experiment. So I want to run dorado aligner on the fastq_pass/ directory This option seems to exist in the wording on github... but iI get an error message complaining that a directory is not a file, and when I use * for all files, it complains that it isn't capable of handling multiple files.
Steps to reproduce the issue:
Please list any steps to reproduce the issue.
Run environment:
- Dorado version:
- Dorado command: aligner
- Operating system: Ubuntu
- Hardware (CPUs, Memory, GPUs): Promethion 24
- Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): fastq
- Source data location (on device or networked drive - NFS, etc.): on device
- Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB):
- Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):
Logs
~/dorado-0.5.3-linux-x64/bin/dorado aligner /data/referncess/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna fastq_pass/ -t 8 -v [2024-08-28 11:18:12.100] [debug] > aligner threads 7, writer threads 1 [2024-08-28 11:18:12.100] [info] > loading index /data/referncess/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna [E::hts_hopen] Failed to open file fastq_pass/ [E::hts_open_format] Failed to open file "fastq_pass/" : Is a directory terminate called after throwing an instance of 'std::runtime_error' what(): Could not open file: fastq_pass/ Aborted (core dumped)
~/dorado-0.5.3-linux-x64/bin/dorado aligner /data/referncess/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna fastq_pass/* -t 8 -v [2024-08-28 11:23:32.765] [debug] > aligner threads 7, writer threads 1 [2024-08-28 11:23:32.765] [error] > multi file input not yet handled
Hi @ilivyatan
You're just missing the required --output-dir argument.
The readme could be clearer on this for sure: When reading from an input folder, dorado aligner also supports emitting aligned files to an output folder, which will preserve the file structure of the inputs:
$ dorado aligner <index> <input_read_folder> --output-dir <output_read_folder>
The dorado aligner --help is more useful
-o, --output-dir
If specified output files will be written to the given folder, otherwise output is to stdout.
Required if the 'reads' positional argument is a folder.
Best regards, Rich
https://dorado-docs.readthedocs.io/en/latest/basecaller/alignment/#writing-to-an-output-directory