Minor: Argument parsing with the --modified-bases parameter
Issue Report
Please describe the issue:
I am not very technical at the command line, so please excuse me my ignorant use of terms.
When running modified base calls, the data dir argument was not correctly handled.
I expected the last argument to be parsed as the data file/dir. However, with the following commands everything after --modified-bases was handled as an argument to the parameter instead of parsing as data or model.
dorado basecaller hac --modified-bases 5mCG_5hmCG ${data_dir} > ${bam_calls}
dorado basecaller --modified-bases 5mCG_5hmCG hac ${data_dir} > ${bam_calls}
Please provide a clear and concise description of the issue you are seeing and the result you expect.
Steps to reproduce the issue:
Please list any steps to reproduce the issue.
Run environment:
- Dorado version: 0.7.1+80da5f5
- Dorado command: basecaller
- Operating system: Rocky Linux 8.8 (Green Obsidian)
- Hardware (CPUs, Memory, GPUs): HPCC
- Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
- Source data location (on device or networked drive - NFS, etc.):
- Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB):
- Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):
Logs
[2024-06-09 10:57:33.636] [info] Running: "basecaller" "--modified-bases" "5mCG_5hmCG" "hac" "../NoE2/"
[2024-06-09 10:57:34.341] [error] 'hac' is not a supported modification please select from pseU, m6A_DRACH, m6A, 6mA, 5mC, 5mCG_5hmCG, 5mCG, 5mC_5hmC, 4mC_5mC
- Please provide output trace of dorado (run dorado with -v, or -vv on a small subset)
Hi @rfran010,
Thanks for raising this issue. We're aware of it from other issues (see 744 and 745 for example). Please ensure that the --modified-bases argument is the last parameter passed to dorado.
Closing as this is explained in more detail in the new docs.