dorado icon indicating copy to clipboard operation
dorado copied to clipboard

Dorado aligner not working on input directory

Open ilivyatan opened this issue 1 year ago • 1 comments

Issue Report

Please describe the issue:

Hi, I am trying to re-align sequencing results from an adaptive sampling experiment. So I want to run dorado aligner on the fastq_pass/ directory This option seems to exist in the wording on github... but iI get an error message complaining that a directory is not a file, and when I use * for all files, it complains that it isn't capable of handling multiple files.

Steps to reproduce the issue:

Please list any steps to reproduce the issue.

Run environment:

  • Dorado version:
  • Dorado command: aligner
  • Operating system: Ubuntu
  • Hardware (CPUs, Memory, GPUs): Promethion 24
  • Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): fastq
  • Source data location (on device or networked drive - NFS, etc.): on device
  • Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB):
  • Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):

Logs

~/dorado-0.5.3-linux-x64/bin/dorado aligner /data/referncess/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna fastq_pass/ -t 8 -v [2024-08-28 11:18:12.100] [debug] > aligner threads 7, writer threads 1 [2024-08-28 11:18:12.100] [info] > loading index /data/referncess/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna [E::hts_hopen] Failed to open file fastq_pass/ [E::hts_open_format] Failed to open file "fastq_pass/" : Is a directory terminate called after throwing an instance of 'std::runtime_error' what(): Could not open file: fastq_pass/ Aborted (core dumped)

~/dorado-0.5.3-linux-x64/bin/dorado aligner /data/referncess/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna fastq_pass/* -t 8 -v [2024-08-28 11:23:32.765] [debug] > aligner threads 7, writer threads 1 [2024-08-28 11:23:32.765] [error] > multi file input not yet handled

ilivyatan avatar Aug 28 '24 08:08 ilivyatan

Hi @ilivyatan

You're just missing the required --output-dir argument.

The readme could be clearer on this for sure: When reading from an input folder, dorado aligner also supports emitting aligned files to an output folder, which will preserve the file structure of the inputs:

$ dorado aligner <index> <input_read_folder> --output-dir <output_read_folder>

The dorado aligner --help is more useful

-o, --output-dir  
      If specified output files will be written to the given folder, otherwise output is to stdout. 
      Required if the 'reads' positional argument is a folder.

Best regards, Rich

HalfPhoton avatar Aug 28 '24 09:08 HalfPhoton

https://dorado-docs.readthedocs.io/en/latest/basecaller/alignment/#writing-to-an-output-directory

HalfPhoton avatar Nov 12 '24 16:11 HalfPhoton