dorado icon indicating copy to clipboard operation
dorado copied to clipboard

Insufficient memory to run inference on cuda:0

Open yplee614 opened this issue 1 year ago • 11 comments

I ran Dorado with the command: dorado correct -m herro-v1/ /home/yplee/strawberry/SRR21142895.fastq > corrected.fasta 2> log The log file shows: [2024-08-04 17:30:50.624] [info] Running: "correct" "-m" "herro-v1/" "/home/yplee/strawberry/SRR21142895.fastq" terminate called after throwing an instance of 'std::runtime_error' what(): Insufficient memory to run inference on cuda:0

Run environment:

Dorado version: 0.7.3 Dorado command: dorado correct -m herro-v1/ /home/yplee/strawberry/SRR21142895.fastq > corrected.fasta 2> log Operating system: Ubuntu 24.04 Hardware (CPUs, Memory, GPUs): AMD 9654, 512Gb, NVIDIA 4070 super (12Gb)

yplee614 avatar Aug 04 '24 09:08 yplee614

According to main page: "The error correction tool is both compute and memory intensive. As a result, it is best run on a system with multiple high performance CPU cores ( > 64 cores), large system memory ( > 256GB) and a modern GPU with a large VRAM ( > 32GB)." I was able to run the correction on 4090 with 24GB but it's unlikely to works with 12 GB.

kubek78 avatar Aug 07 '24 10:08 kubek78

I have the same problem with Dorado 0.7.3. However, Dorado 0.7.2 worked without this problem on exactly the same input file. Either VRAM requirements of Dorado increased, or a bug was introduced. My GPU is GeForce 2080Ti 12GB.

shelkmike avatar Aug 08 '24 14:08 shelkmike

Hi @yplee614, yes as @kubek78 said and shared from the docs there is a very high resource requirement to run dorado correct.

@shelkmike, changes in dorado 0.7.3 resulted an increase in an resource requirements but should not exceed our stated recommendations.

Kind regards, Rich

HalfPhoton avatar Aug 12 '24 09:08 HalfPhoton

We are trying to run a dorado(0.7.3) correct job but are overflowing our available memory. input file is ~130GB case 1: 4x A100 80GB VRAM + 96 threads + 512GB ram dorado correct -x cuda:all input_file > output_file > out of memory

case 2: 1x A100 80GB VRAM + 96 threads + 512GB ram dorado correct -x cuda:0 input_file > output_file > out of memory

case 3: 4x A100 80GB VRAM + 96 threads + 1TB ram dorado correct -x cuda:all input_file > output_file > 997/1008GB about to run out of memory. The output file is generated but never contains data. edit: ran out of ram on 1TB machine. our input file is .fastq, output .fasta. I am running case 3 on Dorado v0.7.2 now.

KeygeneICT avatar Aug 12 '24 13:08 KeygeneICT

@KeygeneICT, thanks for the information. Approximately what depth is your input data?

Kind regards, Rich

HalfPhoton avatar Aug 12 '24 20:08 HalfPhoton

thanks for the information. Approximately what depth is your input data?

I have received the following information about this input data: "50-60 Gb simplex .fasta data, ~100-120X coverage (assuming ~0.5Gb diploid heterozygous genome)"

Update: Running the same dataset on Dorado v0.7.2 has so far only consumed at most 300GB ram and seems to fit well within the resources available. The output file is growing properly as well (18GB currently).

Update2: We are only experiencing excessive memory usage on the v0.7.3. v0.7.2 is working correctly so we will wait for a new release before upgrading from 0.7.2.

KeygeneICT avatar Aug 13 '24 09:08 KeygeneICT

FWIW, I had similar issues running error correction. I was getting:

Insufficient memory to run inference on cuda:0

I managed to get around this using:

dorado correct --infer-threads 1 -b 64

It's faster using these arguments with 0.7.3 than reverting to 0.7.2. Watching with nvtop, I could see the GPU working harder using 0.7.3. We're only running 4070 GPUs, as it's early days for us and we're just testing the waters.

ghost avatar Aug 19 '24 21:08 ghost

@simonhayns Thank you. "--infer-threads 1 -b 64" and "--infer-threads 1 -b 32" still required too much VRAM, but "--infer-threads 1 -b 16" worked for me.

shelkmike avatar Aug 21 '24 07:08 shelkmike

@KeygeneICT, Could you try dorado-0.8.0 which has a number of stability improvements to dorado correct?

Thanks to all for their suggestions on reducing VRAM, there's also been updates to the dorado readme regarding dorado correct input data requirements.

Kind regards, Rich

HalfPhoton avatar Sep 17 '24 09:09 HalfPhoton

@KeygeneICT, Could you try dorado-0.8.0 which has a number of stability improvements to dorado correct?

Apologies for the delay, I was able to test 0.8.0 last week using the same dataset as before and we ran into the same situation we had with 0.7.3. I also tested 0.8.1 just now with --to-paf in order to exclude anything related to GPUs. No data output(empty file), 100% cpu load, stuck on "Loading alignments", memory slowly filling up until it reaches the 1TB RAM limit on the machine I was testing this on.

KeygeneICT avatar Oct 07 '24 11:10 KeygeneICT

Since this issue was opened, several releases of Dorado came out - including the latest v0.9.5 which has improvements to both the overlap and the correction stages.

If the issue has not been resolved in the meantime, would it be possible that you try the latest version and report back if it works for you? Otherwise, please close the issue if possible.

On the chance that your input dataset has a region of extreme coverage, the new version would also likely produce fewer overlaps, but could still exhibit high memory usage.

Perhaps a quick way to estimate the maximum coverage from overlaps would be to count the number of reads in every target pile and to check the top scoring ones, something like this:

cut -f 6,6 overlaps.paf | uniq -c | sort -k 1,1 -n -r | head

This should take only the target names from the paf file, count the unique ones and sort on that count in reverse order. I'm curious what the query count is per target read.

svc-jstone avatar Apr 03 '25 09:04 svc-jstone

If this issue still persists, please reopen it. Since it has been stale for a while, I will close it now due to inactivity and lack of feedback.

svc-jstone avatar Oct 07 '25 08:10 svc-jstone