dorado icon indicating copy to clipboard operation
dorado copied to clipboard

Dorado duplex stand alone v 0.6.0 : when the models are in a different location. How to configure dorado to look for models elsewhere?

Open grendon opened this issue 9 months ago • 2 comments

Issue Report

My stand alone installation of dorado on a unix-based cluster did not come with the models. I downloaded the models manually; but dorado doesn't seem to see them. How can I configure dorado to look for models at this new location?

The original command searched for a model automatically, but it couldn't find it.

$ dorado duplex hac,5mCG_5hmCG pod5/ > Run1_barcode01_duplex.bam

[2024-04-26 14:20:55.833] [info]  - downloading [email protected] with httplib
[2024-04-26 14:22:16.274] [error] Failed to download [email protected]: Could not establish connection
[2024-04-26 14:22:16.274] [info]  - downloading [email protected] with curl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:05:00 --:--:--     0
curl: (28) Connection timed out after 300000 milliseconds
[2024-04-26 14:27:16.320] [error] Failed to download [email protected]: ret=7168, errno=0
[2024-04-26 14:27:16.321] [error] finalise() not called on a HtsFile.
Aborted (core dumped)
  • The above command failed because download commands cannot be performed on the compute node with GPU. It's a security rule that our computing facility has in place.

  • I downloaded the models manually to this location

$ ls /private_stores/mirror/dorado-db/20240501/ | head -n 3
[email protected]
[email protected]_5mCG@v2
[email protected]
  • I reran the dorado command, but it still trying to download the model.

  • What system variable should I set in order to configure dorado to look for models at that location?

  • This is the structure of the installation folder for dorado on our cluster. I am a regular user and I don't have write access to it.

ls /home/apps/software/dorado/0.6.0/
bin
easybuild  
lib  
lib64

Run environment:

  • Dorado version: 0.6.0

  • Dorado command: dorado duplex hac,5mCG_5hmCG pod5/ > Run1_barcode01_duplex.bam

  • Operating system: 4.18.0-348.23.1.el8_5.x86_64 #1 SMP Wed Apr 27 15:32:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

  • Hardware (CPUs, Memory, GPUs):

$ nvidia-smi

Fri Apr 26 14:18:42 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:17:00.0 Off |                  N/A |
| 22%   50C    P0    62W / 260W |      0MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
  • Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
  • Source data location (on device or networked drive - NFS, etc.): NFS
  • Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): SQK-NBD111-24 MinKNOW (23.11.7)
  • Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):

Logs

  • Please provide output trace of dorado (run dorado with -v, or -vv on a small subset)

grendon avatar May 01 '24 18:05 grendon

Hi @grendon,

Automatic model selection currently doesn't have an option to provide a search directory. This is something we can look to introduce in a future release.

In the mean time, you will need to specify the full path to the required basecall and modbase models:

dorado duplex \
    <path to models>/[email protected] \
    pod5 \
    --modified-bases-models <path to models>/[email protected]_5mCG_5hmCG@v1 \ 
> calls.bam

malton-ont avatar May 02 '24 08:05 malton-ont

@grendon, If the models are in the current working directory then the they will be found automatically and not downloaded again.

You can achieve this by symbolically linking the downloaded models into the current working directory so that you can still use the same model complex command you described.

ln -s /path/to/models/dna_r10.4.1* .
dorado duplex hac,5mCG_5hmCG pod5 > Run1_barcode01_duplex.bam

HalfPhoton avatar May 03 '24 09:05 HalfPhoton

Great! I will try again with your suggestions. Thanks

grendon avatar May 15 '24 19:05 grendon