dorado
dorado copied to clipboard
Insufficient memory to run inference on cuda:0
I ran Dorado with the command: dorado correct -m herro-v1/ /home/yplee/strawberry/SRR21142895.fastq > corrected.fasta 2> log The log file shows: [2024-08-04 17:30:50.624] [info] Running: "correct" "-m" "herro-v1/" "/home/yplee/strawberry/SRR21142895.fastq" terminate called after throwing an instance of 'std::runtime_error' what(): Insufficient memory to run inference on cuda:0
Run environment:
Dorado version: 0.7.3 Dorado command: dorado correct -m herro-v1/ /home/yplee/strawberry/SRR21142895.fastq > corrected.fasta 2> log Operating system: Ubuntu 24.04 Hardware (CPUs, Memory, GPUs): AMD 9654, 512Gb, NVIDIA 4070 super (12Gb)
According to main page: "The error correction tool is both compute and memory intensive. As a result, it is best run on a system with multiple high performance CPU cores ( > 64 cores), large system memory ( > 256GB) and a modern GPU with a large VRAM ( > 32GB)." I was able to run the correction on 4090 with 24GB but it's unlikely to works with 12 GB.
I have the same problem with Dorado 0.7.3. However, Dorado 0.7.2 worked without this problem on exactly the same input file. Either VRAM requirements of Dorado increased, or a bug was introduced. My GPU is GeForce 2080Ti 12GB.
Hi @yplee614, yes as @kubek78 said and shared from the docs there is a very high resource requirement to run dorado correct.
@shelkmike, changes in dorado 0.7.3 resulted an increase in an resource requirements but should not exceed our stated recommendations.
Kind regards, Rich
We are trying to run a dorado(0.7.3) correct job but are overflowing our available memory.
input file is ~130GB
case 1: 4x A100 80GB VRAM + 96 threads + 512GB ram
dorado correct -x cuda:all input_file > output_file > out of memory
case 2: 1x A100 80GB VRAM + 96 threads + 512GB ram
dorado correct -x cuda:0 input_file > output_file > out of memory
case 3: 4x A100 80GB VRAM + 96 threads + 1TB ram
dorado correct -x cuda:all input_file > output_file > 997/1008GB about to run out of memory.
The output file is generated but never contains data.
edit: ran out of ram on 1TB machine. our input file is .fastq, output .fasta. I am running case 3 on Dorado v0.7.2 now.
@KeygeneICT, thanks for the information. Approximately what depth is your input data?
Kind regards, Rich
thanks for the information. Approximately what depth is your input data?
I have received the following information about this input data: "50-60 Gb simplex .fasta data, ~100-120X coverage (assuming ~0.5Gb diploid heterozygous genome)"
Update: Running the same dataset on Dorado v0.7.2 has so far only consumed at most 300GB ram and seems to fit well within the resources available. The output file is growing properly as well (18GB currently).
Update2: We are only experiencing excessive memory usage on the v0.7.3. v0.7.2 is working correctly so we will wait for a new release before upgrading from 0.7.2.
FWIW, I had similar issues running error correction. I was getting:
Insufficient memory to run inference on cuda:0
I managed to get around this using:
dorado correct --infer-threads 1 -b 64
It's faster using these arguments with 0.7.3 than reverting to 0.7.2. Watching with nvtop, I could see the GPU working harder using 0.7.3. We're only running 4070 GPUs, as it's early days for us and we're just testing the waters.
@simonhayns Thank you. "--infer-threads 1 -b 64" and "--infer-threads 1 -b 32" still required too much VRAM, but "--infer-threads 1 -b 16" worked for me.
@KeygeneICT, Could you try dorado-0.8.0 which has a number of stability improvements to dorado correct?
Thanks to all for their suggestions on reducing VRAM, there's also been updates to the dorado readme regarding dorado correct input data requirements.
Kind regards, Rich
@KeygeneICT, Could you try dorado-0.8.0 which has a number of stability improvements to dorado correct?
Apologies for the delay, I was able to test 0.8.0 last week using the same dataset as before and we ran into the same situation we had with 0.7.3. I also tested 0.8.1 just now with --to-paf in order to exclude anything related to GPUs. No data output(empty file), 100% cpu load, stuck on "Loading alignments", memory slowly filling up until it reaches the 1TB RAM limit on the machine I was testing this on.
Since this issue was opened, several releases of Dorado came out - including the latest v0.9.5 which has improvements to both the overlap and the correction stages.
If the issue has not been resolved in the meantime, would it be possible that you try the latest version and report back if it works for you? Otherwise, please close the issue if possible.
On the chance that your input dataset has a region of extreme coverage, the new version would also likely produce fewer overlaps, but could still exhibit high memory usage.
Perhaps a quick way to estimate the maximum coverage from overlaps would be to count the number of reads in every target pile and to check the top scoring ones, something like this:
cut -f 6,6 overlaps.paf | uniq -c | sort -k 1,1 -n -r | head
This should take only the target names from the paf file, count the unique ones and sort on that count in reverse order. I'm curious what the query count is per target read.
If this issue still persists, please reopen it. Since it has been stale for a while, I will close it now due to inactivity and lack of feedback.