dorado
dorado copied to clipboard
Dorado duplex stand alone v 0.6.0 : when the models are in a different location. How to configure dorado to look for models elsewhere?
Issue Report
My stand alone installation of dorado on a unix-based cluster did not come with the models. I downloaded the models manually; but dorado doesn't seem to see them. How can I configure dorado to look for models at this new location?
The original command searched for a model automatically, but it couldn't find it.
$ dorado duplex hac,5mCG_5hmCG pod5/ > Run1_barcode01_duplex.bam [2024-04-26 14:20:55.833] [info] - downloading [email protected] with httplib [2024-04-26 14:22:16.274] [error] Failed to download [email protected]: Could not establish connection [2024-04-26 14:22:16.274] [info] - downloading [email protected] with curl % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:05:00 --:--:-- 0 curl: (28) Connection timed out after 300000 milliseconds [2024-04-26 14:27:16.320] [error] Failed to download [email protected]: ret=7168, errno=0 [2024-04-26 14:27:16.321] [error] finalise() not called on a HtsFile. Aborted (core dumped)
-
The above command failed because download commands cannot be performed on the compute node with GPU. It's a security rule that our computing facility has in place.
-
I downloaded the models manually to this location
$ ls /private_stores/mirror/dorado-db/20240501/ | head -n 3 [email protected] [email protected]_5mCG@v2 [email protected]
-
I reran the dorado command, but it still trying to download the model.
-
What system variable should I set in order to configure dorado to look for models at that location?
-
This is the structure of the installation folder for dorado on our cluster. I am a regular user and I don't have write access to it.
ls /home/apps/software/dorado/0.6.0/ bin easybuild lib lib64
Run environment:
-
Dorado version: 0.6.0
-
Dorado command: dorado duplex hac,5mCG_5hmCG pod5/ > Run1_barcode01_duplex.bam
-
Operating system: 4.18.0-348.23.1.el8_5.x86_64 #1 SMP Wed Apr 27 15:32:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
-
Hardware (CPUs, Memory, GPUs):
$ nvidia-smi
Fri Apr 26 14:18:42 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:17:00.0 Off | N/A |
| 22% 50C P0 62W / 260W | 0MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
- Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
- Source data location (on device or networked drive - NFS, etc.): NFS
- Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): SQK-NBD111-24 MinKNOW (23.11.7)
- Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):
Logs
- Please provide output trace of dorado (run dorado with -v, or -vv on a small subset)
Hi @grendon,
Automatic model selection currently doesn't have an option to provide a search directory. This is something we can look to introduce in a future release.
In the mean time, you will need to specify the full path to the required basecall and modbase models:
dorado duplex \
<path to models>/[email protected] \
pod5 \
--modified-bases-models <path to models>/[email protected]_5mCG_5hmCG@v1 \
> calls.bam
@grendon, If the models are in the current working directory then the they will be found automatically and not downloaded again.
You can achieve this by symbolically linking the downloaded models into the current working directory so that you can still use the same model complex command you described.
ln -s /path/to/models/dna_r10.4.1* .
dorado duplex hac,5mCG_5hmCG pod5 > Run1_barcode01_duplex.bam
Great! I will try again with your suggestions. Thanks