medaka icon indicating copy to clipboard operation
medaka copied to clipboard

torch.OutOfMemoryError: CUDA out of memory.

Open byeollee opened this issue 8 months ago • 3 comments

Hello, thanks for maintaining this useful project!

I didn't experience this issue when I downloaded and used the project previously, but, now I'm trying to download and use it again, and it's not working as expected.

Here is the command what I ran.

medaka_consensus -i ./cluster_001/4_reads.fastq -d ./cluster_001/7_final_consensus.fasta -o ./medaka_25559_0423 -t 12&&cp medaka_25559_0423/consensus.fasta ./cluster_001/8_medaka.fasta &&rm -r medaka_25559_0423

Logging

WARNING: Failed to detect a model version, will use default: 'r1041_e82_400bps_sup_v5.0.0'
Checking program versions
This is medaka 2.0.1
Program    Version    Required   Pass     
bcftools   1.13       1.11       True     
bgzip      1.13+ds    1.11       True     
minimap2   2.24       2.11       True     
samtools   1.12       1.11       True     
tabix      1.13+ds    1.11       True     
[19:46:50 - MdlStrTGZ] Successfully removed temporary files from /tmp/tmpwgn7f1xk.
[19:46:52 - MdlStrTGZ] Successfully removed temporary files from /tmp/tmp1ah_wcqq.
Aligning basecalls to draft
Creating fai index file /home/star/bioinfo/25559/trycycler_25559/cluster_001/7_final_consensus.fasta.fai
Creating mmi index file /home/star/bioinfo/25559/trycycler_25559/cluster_001/7_final_consensus.fasta.map-ont.mmi
[M::mm_idx_gen::0.165*1.01] collected minimizers
[M::mm_idx_gen::0.208*1.41] sorted minimizers
[M::main::0.259*1.33] loaded/built the index for 1 target sequence(s)
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.271*1.31] distinct minimizers: 846770 (97.94% are singletons); average occurrences: 1.030; average spacing: 5.349; total length: 4667057
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 -I 16G -x map-ont -d /home/star/bioinfo/25559/trycycler_25559/cluster_001/7_final_consensus.fasta.map-ont.mmi /home/star/bioinfo/25559/trycycler_25559/cluster_001/7_final_consensus.fasta
[M::main] Real time: 0.277 sec; CPU: 0.362 sec; Peak RSS: 0.045 GB
[M::main::0.072*1.02] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.087*1.01] mid_occ = 10
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.097*1.01] distinct minimizers: 846770 (97.94% are singletons); average occurrences: 1.030; average spacing: 5.349; total length: 4667057
[M::worker_pipeline::22.304*9.73] mapped 37907 sequences
[M::worker_pipeline::23.027*9.46] mapped 5082 sequences
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 -x map-ont --secondary=no -L --MD -A 2 -B 4 -O 4,24 -E 2,1 -t 12 -a /home/star/bioinfo/25559/trycycler_25559/cluster_001/7_final_consensus.fasta.map-ont.mmi /home/star/bioinfo/25559/trycycler_25559/cluster_001/4_reads.fastq
[M::main] Real time: 23.097 sec; CPU: 217.831 sec; Peak RSS: 4.018 GB
[bam_sort_core] merging from 0 files and 12 in-memory blocks...
Running medaka consensus
[19:47:24 - Predict] Processing region(s): cluster_001_consensus:0-4667057
[19:47:24 - Predict] Using model: /home/star/medaka/lib/python3.10/site-packages/medaka/data/r1041_e82_400bps_sup_v5.0.0_model_pt.tar.gz.
[19:47:24 - Predict] Using minimum mapQ threshold of 1 for read filtering.
[19:47:24 - Predict] Found a GPU.
[19:47:24 - MdlStrTGZ] Model GRUModel(
  (gru): GRU(10, 128, num_layers=2, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=256, out_features=5, bias=True)
)
[19:47:24 - MdlStrTGZ] loading weights from /tmp/tmpxtvfd8rl/model/weights.pt
[19:47:24 - MdlStrTGZ] Successfully removed temporary files from /tmp/tmpxtvfd8rl.
[19:47:24 - Predict] Model device: cuda:0
[19:47:24 - Predict] Running prediction at half precision
[19:47:24 - BAMFile] Creating pool of 16 BAM file sets.
[19:47:24 - Predict] Processing 5 long region(s) with batching.
[19:47:24 - Sampler] Initializing sampler for consensus of region cluster_001_consensus:0-1000000.
[19:47:24 - Sampler] Initializing sampler for consensus of region cluster_001_consensus:999000-1999000.
[19:47:24 - PWorker] Running inference for 4.7M draft bases.
[19:47:29 - Feature] Processed cluster_001_consensus:0.0-999999.1 (median depth 120.0)
[19:47:29 - Sampler] Took 5.05s to make features.
[19:47:29 - Sampler] Initializing sampler for consensus of region cluster_001_consensus:1998000-2998000.
[19:47:29 - Feature] Processed cluster_001_consensus:999000.0-1998999.0 (median depth 119.0)
[19:47:29 - Sampler] Took 5.11s to make features.
[19:47:29 - Sampler] Initializing sampler for consensus of region cluster_001_consensus:2997000-3997000.
Traceback (most recent call last):
  File "/home/star/medaka/bin/medaka", line 8, in <module>
    sys.exit(main())
  File "/home/star/medaka/lib/python3.10/site-packages/medaka/medaka.py", line 836, in main
    args.func(args)
  File "/home/star/medaka/lib/python3.10/site-packages/medaka/prediction.py", line 171, in predict
    remainder_regions_depth = run_prediction(
  File "/home/star/medaka/lib/python3.10/site-packages/medaka/prediction.py", line 46, in run_prediction
    class_probs = model.predict_on_batch(x_data)
  File "/home/star/medaka/lib/python3.10/site-packages/medaka/models.py", line 301, in predict_on_batch
    x = self.forward(x).detach().cpu()
  File "/home/star/medaka/lib/python3.10/site-packages/medaka/models.py", line 337, in forward
    x = self.gru(x)[0]
  File "/home/star/medaka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/star/medaka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/star/medaka/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 1393, in forward
    result = _VF.gru(
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 6.27 GiB. GPU 0 has a total capacity of 3.94 GiB of which 3.09 GiB is free. Including non-PyTorch memory, this process has 606.00 MiB memory in use. Of the allocated memory 528.32 MiB is allocated by PyTorch, and 23.68 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Failed to run medaka consensus.

This coming

Environment (if you do not have a GPU, write No GPU):

Installation method [from python - virtual ]
OS: [Ubuntu 22.04.5]
medaka version (2.0.1)
GPU model

*-display description: VGA compatible controller product: ASPEED Graphics Family [1A03:2000] vendor: ASPEED Technology, Inc. [1A03] physical id: 0 bus info: pci@0000:08:00.0 logical name: /dev/fb1 version: 30 width: 32 bits clock: 33MHz capabilities: vga_controller cap_list fb configuration: depth=32 driver=ast latency=0 resolution=1024,768 resources: irq:16 memory:c4000000-c5ffffff memory:c6000000-c601ffff ioport:4000(size=128) *-display description: VGA compatible controller product: GP107 [GeForce GTX 1050 Ti] [10DE:1C82] vendor: NVIDIA Corporation [10DE] physical id: 0 bus info: pci@0000:81:00.0 logical name: /dev/fb0 version: a1 width: 64 bits clock: 33MHz capabilities: vga_controller bus_master cap_list rom fb configuration: depth=32 driver=nvidia latency=0 resolution=800,600 resources: irq:81 memory:fa000000-faffffff memory:e0000000-efffffff memory:f0000000-f1ffffff ioport:f000(size=128) memory:c0000-dffff

nvidia-smi

NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2
cuDNN version

but, I used CPU only using '''pip install medaka-cpu --extra-index-url https://download.pytorch.org/whl/cpu'''

Let me know if you need any more information. I'd happy to help troubleshoot further. Thank you in advance for your time and support!

byeollee avatar Apr 23 '25 10:04 byeollee

You can force medaka to run on CPU-only by setting CUDA_VISIBLE_DEVICES:

CUDA_VISIBLE_DEVICES="" medaka_consensus ...

If you want to run on the GPU, you can reduce the memory footprint by reducing the batch size (medaka_consensus -b option) from the default value of 100 to a lower value.

A secondary issue is that the CPU-only installation should not be fetching a cuda-capable version of pytorch. Was the installation done in a clean venv without a pre-existing install of pytorch? Please post the results of

pip show medaka-cpu
pip show torch

ftostevin-ont avatar Apr 23 '25 12:04 ftostevin-ont

Thank you for kindness. I tried

CUDA_VISIBLE_DEVICES="" medaka_consensus ...

and check to show medaka-cpu and torch (medaka) star@star-Z10PE-D16-WS:~/bioinfo/25559/trycycler_25559$ pip show medaka-cpu Name: medaka-cpu Version: 2.0.1 Summary: Neural network sequence error correction. Home-page: https://github.com/nanoporetech/medaka Author: ont-research Author-email: License: Location: /home/star/medaka/lib/python3.10/site-packages Requires: cffi, edlib, h5py, intervaltree, numpy, ont-fast5-api, ont-mappy, ont-parasail, pysam, pyspoa, requests, torch, tqdm, wurlitzer Required-by: (medaka) star@star-Z10PE-D16-WS:~/bioinfo/25559/trycycler_25559$ pip show torch Name: torch Version: 2.6.0 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: [email protected] License: BSD-3-Clause Location: /home/star/medaka/lib/python3.10/site-packages Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-cusparselt-cu12, nvidia-nccl-cu12, nvidia-nvjitlink-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions Required-by: medaka, medaka-cpu

But same issue coming.

this command ( CUDA_VISIBLE_DEVICES="") need to do something in file or ...?

As your another recommend, I use '-b 10' lower than 100, It's works! Thank you for helping me!

byeollee avatar Apr 24 '25 00:04 byeollee

If my data was basecalled with Dorado version 0.9.1 with super accurate (sup) model and Flowcell: FLO-MIN114 (R10.4.1) was used so what code should I use to run medaka. or I can no longer use it in 2025 ? my genome is 40mb and its a fungus.

Agridibuu avatar Apr 29 '25 06:04 Agridibuu