Issue Report

Please describe the issue: CUDA OUT OF MEMORY pyTorch ERROR (Dependent on # of GPUs)

Please provide a clear and concise description of the issue you are seeing and the result you expect. pyTorch fails to allocate memory (even though memory is available in excess per nvidia-smi output) when initializing Dorado across more than 1 A4000 GPU. I would expect Dorado to be able to utilize the two GPUs in parallel without having to limit memory usage manually via batchsize.

Steps to reproduce the issue:

When initializing Dorado with 2X GPUs either by allowing Dorado to automatically detect GPUs, or passing --device cuda:0,1 or passing pyTorch (CUDA_VISIBLE_DEVICES=0,1) pyTorch presents error "CUDA OUT OF MEMORY" occurs. However, if only one GPU is utilized the error does not occur (--device cuda:0 or --device cuda:1). Moreover, limiting batchsize (--batchsize 250) allows Dorado to run to completion. Finally, when we try to switch to the SUP model, we have not found a batchsize which will allow Dorado to run across the 2 GPUs. Note: A4000 GPUs have 16GB of RAM each.

Please list any steps to reproduce the issue.

Run environment:

Dorado version: 0.5.2
Cuda version: 12.3 The paths to all libraries included with 0.5.2 are explicitly defined.
Dorado command: dorado basecaller [email protected] pod5s/ -v --modified-bases 6mA 5mC_5hmC --reference AssemblyScaffolds.fasta --kit-name SQK-NBD114-96 >calls.bam
Operating system: RHEL 8
Hardware (CPUs, Memory, GPUs): Intel Xeon Gold 36 core, 96GB RAM, 2X A4000 Nvidia GPUs
Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
Source data location (on device or networked drive - NFS, etc.): local SSD
Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): ~300GB
Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):

Logs

Please provide output trace of dorado (run dorado with -v, or -vv on a small subset): I will followup with specific errors when my machine is brought back online.

Currently unavailable, will update when possible.

Feb 06 '24 19:02 ericmsmall

Hi @ericmsmall, thanks for the detailed report.

When initializing Dorado with 2X GPUs...

Just to clarify - Does the error occur before basecalling starts i.e. during auto batch size selection?

We're continuously working on improving the auto batch size algorithm to get the best out of the hardware to deliver the best basecalling performance while being stable. I'll report your specific hardware configuration back to the team to see what we can do in a future release of Dorado.

The -v verbose output now shows much more information about the auto batch size calculation. This should help guide you to the optimal batch size for your hardware if you wish to experiment further around your known good -b 250 value.

Kind regards, Rich

Feb 07 '24 10:02 HalfPhoton

@HalfPhoton Here are the console outputs from dorado -v. Thanks!

SUP with batchsize 250

dorado basecaller [email protected] pod5s/ -v --modified-bases 6mA 5mC_5hmC --batchsize 250 --reference X.fasta --kit-name SQK-NBD114-96 >calls.bam

[2024-02-07 11:58:08.679] [debug] - matching modification model found: [email protected]_6mA@v2 [2024-02-07 11:58:08.680] [debug] - matching modification model found: [email protected]_5mC_5hmC@v1 [2024-02-07 11:58:08.680] [info] > Creating basecall pipeline [2024-02-07 11:58:26.831] [debug] cuda:1 memory available: 14.54GB [2024-02-07 11:58:26.831] [debug] Auto batchsize cuda:1: memory limit 13.54GB [2024-02-07 11:58:26.831] [debug] Maximum safe estimated batch size for cuda:1: 512 [2024-02-07 11:58:26.831] [debug] Device cuda:1 Model memory 8.73GB [2024-02-07 11:58:26.831] [debug] Device cuda:1 Decode memory 3.61GB [2024-02-07 11:58:26.832] [debug] cuda:0 memory available: 14.57GB [2024-02-07 11:58:26.832] [debug] Auto batchsize cuda:0: memory limit 13.57GB [2024-02-07 11:58:26.832] [debug] Maximum safe estimated batch size for cuda:0: 512 [2024-02-07 11:58:26.832] [debug] Device cuda:0 Model memory 8.73GB [2024-02-07 11:58:26.832] [debug] Device cuda:0 Decode memory 3.61GB [2024-02-07 11:58:27.840] [warning] - set batch size for cuda:0 to 256 [2024-02-07 11:58:27.848] [warning] - set batch size for cuda:1 to 256 [2024-02-07 11:58:27.849] [debug] - adjusted chunk size to match model stride: 10000 -> 9996 [2024-02-07 11:58:29.122] [debug] > Map parameters input by user: dbg print qname=false and aln seq=false. [2024-02-07 11:58:29.710] [debug] Creating barcoding info for kit: SQK-NBD114-96 [2024-02-07 11:58:29.710] [info] Barcode for SQK-NBD114-96 [2024-02-07 11:58:29.711] [debug] - adjusted overlap to match model stride: 500 -> 498 [2024-02-07 11:58:29.727] [debug] Load reads from file Nanno/PAS40213_fail_barcode01_cf87646e_8cd667f2_0.pod5 [2024-02-07 11:58:29.729] [debug] Load reads from file Nanno/PAS40213_fail_barcode03_cf87646e_8cd667f2_0.pod5 [2024-02-07 11:58:29.732] [debug] Load reads from file Nanno/PAS40213_fail_barcode04_cf87646e_8cd667f2_0.pod5 [2024-02-07 11:58:30.233] [debug] Load reads from file Nanno/PAS40213_fail_barcode05_cf87646e_8cd667f2_0.pod5 [2024-02-07 11:58:30.234] [debug] Load reads from file Nanno/PAS40213_fail_barcode06_cf87646e_8cd667f2_0.pod5 [2024-02-07 11:58:30.235] [debug] Load reads from file Nanno/PAS40213_fail_barcode07_cf87646e_8cd667f2_0.pod5 [2024-02-07 11:58:30.236] [debug] Load reads from file Nanno/PAS40213_fail_barcode09_cf87646e_8cd667f2_0.pod5 [2024-02-07 11:58:30.237] [debug] Load reads from file Nanno/PAS40213_fail_barcode10_cf87646e_8cd667f2_0.pod5 [2024-02-07 11:58:30.246] [debug] Load reads from file Nanno/PAS40213_fail_barcode11_cf87646e_8cd667f2_0.pod5 [2024-02-07 11:58:30.247] [debug] Load reads from file Nanno/PAS40213_fail_barcode12_cf87646e_8cd667f2_0.pod5 [2024-02-07 11:58:30.973] [debug] > Kits to evaluate: 1 [2024-02-07 11:58:35.724] [warning] Caught Torch error 'CUDA out of memory. Tried to allocate 1.68 GiB (GPU 1; 15.71 GiB total capacity; 4.89 GiB already allocated; 1.07 GiB free; 5.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF', clearing CUDA cache and retrying.

HAC or SUP without batchsize

dorado basecaller [email protected] pod5s/ -v --modified-bases 6mA 5mC_5hmC --reference X.fasta --kit-name SQK-NBD114-96 >calls.bam

[2024-02-07 12:00:54.633] [debug] - matching modification model found: [email protected]_6mA@v2 [2024-02-07 12:00:54.633] [debug] - matching modification model found: [email protected]_5mC_5hmC@v1 [2024-02-07 12:00:54.634] [info] > Creating basecall pipeline [2024-02-07 12:00:56.223] [debug] cuda:0 memory available: 14.60GB [2024-02-07 12:00:56.223] [debug] Auto batchsize cuda:0: memory limit 13.60GB [2024-02-07 12:00:56.223] [debug] Auto batchsize cuda:0: testing up to 2048 in steps of 64 [2024-02-07 12:00:56.242] [debug] cuda:1 memory available: 14.51GB [2024-02-07 12:00:56.242] [debug] Auto batchsize cuda:1: memory limit 13.51GB [2024-02-07 12:00:56.242] [debug] Auto batchsize cuda:1: testing up to 1984 in steps of 64 [2024-02-07 12:00:56.561] [debug] Auto batchsize cuda:0: 64, time per chunk 2.405376 ms [2024-02-07 12:00:56.564] [debug] Auto batchsize cuda:1: 64, time per chunk 2.414515 ms [2024-02-07 12:00:56.880] [debug] Auto batchsize cuda:0: 128, time per chunk 1.241856 ms [2024-02-07 12:00:56.883] [debug] Auto batchsize cuda:1: 128, time per chunk 1.242438 ms [2024-02-07 12:00:57.192] [debug] Auto batchsize cuda:0: 192, time per chunk 0.811270 ms [2024-02-07 12:00:57.195] [debug] Auto batchsize cuda:1: 192, time per chunk 0.809908 ms [2024-02-07 12:00:57.512] [debug] Auto batchsize cuda:0: 256, time per chunk 0.623080 ms [2024-02-07 12:00:57.517] [debug] Auto batchsize cuda:1: 256, time per chunk 0.627692 ms [2024-02-07 12:00:57.834] [debug] Auto batchsize cuda:0: 320, time per chunk 0.501120 ms [2024-02-07 12:00:57.836] [debug] Auto batchsize cuda:1: 320, time per chunk 0.498966 ms [2024-02-07 12:00:58.156] [debug] Auto batchsize cuda:0: 384, time per chunk 0.419293 ms [2024-02-07 12:00:58.160] [debug] Auto batchsize cuda:1: 384, time per chunk 0.421686 ms [2024-02-07 12:00:58.475] [debug] Auto batchsize cuda:0: 448, time per chunk 0.353337 ms [2024-02-07 12:00:58.482] [debug] Auto batchsize cuda:1: 448, time per chunk 0.357237 ms [2024-02-07 12:00:58.792] [debug] Auto batchsize cuda:0: 512, time per chunk 0.309610 ms [2024-02-07 12:00:58.800] [debug] Auto batchsize cuda:1: 512, time per chunk 0.310585 ms [2024-02-07 12:00:59.123] [debug] Auto batchsize cuda:0: 576, time per chunk 0.287176 ms [2024-02-07 12:00:59.125] [debug] Auto batchsize cuda:1: 576, time per chunk 0.282584 ms [2024-02-07 12:00:59.451] [debug] Auto batchsize cuda:1: 640, time per chunk 0.254069 ms [2024-02-07 12:00:59.454] [debug] Auto batchsize cuda:0: 640, time per chunk 0.258116 ms [2024-02-07 12:00:59.780] [debug] Auto batchsize cuda:1: 704, time per chunk 0.233327 ms [2024-02-07 12:00:59.782] [debug] Auto batchsize cuda:0: 704, time per chunk 0.233123 ms [2024-02-07 12:01:00.109] [debug] Auto batchsize cuda:1: 768, time per chunk 0.214251 ms [2024-02-07 12:01:00.111] [debug] Auto batchsize cuda:0: 768, time per chunk 0.213947 ms [2024-02-07 12:01:00.437] [debug] Auto batchsize cuda:1: 832, time per chunk 0.196385 ms [2024-02-07 12:01:00.445] [debug] Auto batchsize cuda:0: 832, time per chunk 0.199301 ms [2024-02-07 12:01:00.766] [debug] Auto batchsize cuda:1: 896, time per chunk 0.183439 ms [2024-02-07 12:01:00.778] [debug] Auto batchsize cuda:0: 896, time per chunk 0.186150 ms [2024-02-07 12:01:01.097] [debug] Auto batchsize cuda:1: 960, time per chunk 0.171900 ms [2024-02-07 12:01:01.110] [debug] Auto batchsize cuda:0: 960, time per chunk 0.172577 ms [2024-02-07 12:01:01.433] [debug] Auto batchsize cuda:1: 1024, time per chunk 0.163234 ms [2024-02-07 12:01:01.443] [debug] Auto batchsize cuda:0: 1024, time per chunk 0.162131 ms [2024-02-07 12:01:01.783] [debug] Auto batchsize cuda:1: 1088, time per chunk 0.160965 ms [2024-02-07 12:01:01.790] [debug] Auto batchsize cuda:0: 1088, time per chunk 0.157000 ms [2024-02-07 12:01:02.137] [debug] Auto batchsize cuda:1: 1152, time per chunk 0.146707 ms [2024-02-07 12:01:02.149] [debug] Auto batchsize cuda:0: 1152, time per chunk 0.148872 ms [2024-02-07 12:01:02.478] [debug] Auto batchsize cuda:1: 1216, time per chunk 0.139191 ms [2024-02-07 12:01:02.488] [debug] Auto batchsize cuda:0: 1216, time per chunk 0.138505 ms [2024-02-07 12:01:02.817] [debug] Auto batchsize cuda:1: 1280, time per chunk 0.130977 ms [2024-02-07 12:01:02.842] [debug] Auto batchsize cuda:0: 1280, time per chunk 0.133061 ms [2024-02-07 12:01:03.158] [debug] Auto batchsize cuda:1: 1344, time per chunk 0.127034 ms [2024-02-07 12:01:03.185] [debug] Auto batchsize cuda:0: 1344, time per chunk 0.127408 ms [2024-02-07 12:01:03.506] [debug] Auto batchsize cuda:1: 1408, time per chunk 0.122362 ms [2024-02-07 12:01:03.542] [debug] Auto batchsize cuda:0: 1408, time per chunk 0.125748 ms [2024-02-07 12:01:03.867] [debug] Auto batchsize cuda:1: 1472, time per chunk 0.121551 ms [2024-02-07 12:01:03.912] [debug] Auto batchsize cuda:0: 1472, time per chunk 0.124647 ms [2024-02-07 12:01:04.239] [debug] Auto batchsize cuda:1: 1536, time per chunk 0.120089 ms [2024-02-07 12:01:04.295] [debug] Auto batchsize cuda:0: 1536, time per chunk 0.123888 ms [2024-02-07 12:01:04.611] [debug] Auto batchsize cuda:1: 1600, time per chunk 0.113680 ms [2024-02-07 12:01:04.669] [debug] Auto batchsize cuda:0: 1600, time per chunk 0.116839 ms [2024-02-07 12:01:04.983] [debug] Auto batchsize cuda:1: 1664, time per chunk 0.110649 ms [2024-02-07 12:01:05.054] [debug] Auto batchsize cuda:0: 1664, time per chunk 0.114667 ms [2024-02-07 12:01:05.364] [debug] Auto batchsize cuda:1: 1728, time per chunk 0.108722 ms [2024-02-07 12:01:05.449] [debug] Auto batchsize cuda:0: 1728, time per chunk 0.113264 ms [2024-02-07 12:01:05.757] [debug] Auto batchsize cuda:1: 1792, time per chunk 0.107558 ms [2024-02-07 12:01:05.853] [debug] Auto batchsize cuda:0: 1792, time per chunk 0.111525 ms [2024-02-07 12:01:06.156] [debug] Auto batchsize cuda:1: 1856, time per chunk 0.106708 ms [2024-02-07 12:01:06.275] [debug] Auto batchsize cuda:0: 1856, time per chunk 0.110296 ms [2024-02-07 12:01:06.557] [debug] Auto batchsize cuda:1: 1920, time per chunk 0.104293 ms [2024-02-07 12:01:06.693] [debug] Auto batchsize cuda:0: 1920, time per chunk 0.108546 ms [2024-02-07 12:01:06.965] [debug] Auto batchsize cuda:1: 1984, time per chunk 0.102632 ms [2024-02-07 12:01:06.965] [debug] Device cuda:1 Model memory 9.31GB [2024-02-07 12:01:06.965] [debug] Device cuda:1 Decode memory 3.84GB [2024-02-07 12:01:07.621] [info] - set batch size for cuda:0 to 2048 [2024-02-07 12:01:07.682] [info] - set batch size for cuda:1 to 1984 [2024-02-07 12:01:07.683] [debug] - adjusted chunk size to match model stride: 10000 -> 9996 [2024-02-07 12:01:08.523] [debug] > Map parameters input by user: dbg print qname=false and aln seq=false. [2024-02-07 12:01:09.075] [debug] Creating barcoding info for kit: SQK-NBD114-96 [2024-02-07 12:01:09.075] [info] Barcode for SQK-NBD114-96 [2024-02-07 12:01:09.075] [debug] - adjusted overlap to match model stride: 500 -> 498 [2024-02-07 12:01:09.092] [debug] Load reads from file Nanno/PAS40213_fail_barcode01_cf87646e_8cd667f2_0.pod5 [2024-02-07 12:01:09.094] [debug] Load reads from file Nanno/PAS40213_fail_barcode03_cf87646e_8cd667f2_0.pod5 [2024-02-07 12:01:09.096] [debug] Load reads from file Nanno/PAS40213_fail_barcode04_cf87646e_8cd667f2_0.pod5 [2024-02-07 12:01:09.205] [warning] Caught Torch error 'CUDA out of memory. Tried to allocate 8.95 GiB (GPU 0; 15.73 GiB total capacity; 175.37 MiB already allocated; 5.68 GiB free; 254.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF', clearing CUDA cache and retrying. terminate called after throwing an instance of 'c10::OutOfMemoryError' what(): CUDA out of memory. Tried to allocate 8.95 GiB (GPU 0; 15.73 GiB total capacity; 175.37 MiB already allocated; 5.68 GiB free; 254.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Exception raised from malloc at /pytorch/pyold/c10/cuda/CUDACachingAllocator.cpp:913 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f9724f639b7 in /home/XXXXXXX/Dorado/dorado-0.5.2-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #1: + 0xa9f8645 (0x7f9724f23645 in /home/XXXXXXX/Dorado/dorado-0.5.2-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #2: + 0xa9f893e (0x7f9724f2393e in /home/XXXXXXX/Dorado/dorado-0.5.2-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #3: + 0xa9f8cce (0x7f9724f23cce in /home/XXXXXXX/Dorado/dorado-0.5.2-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #4: + 0x4530bc1 (0x7f971ea5bbc1 in /home/XXXXXXX/Dorado/dorado-0.5.2-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #5: at::detail::empty_generic(c10::ArrayRef, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optionalc10::MemoryFormat) + 0x14 (0x7f971ea55604 in /home/XXXXXXX/Dorado/dorado-0.5.2-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #6: at::detail::empty_cuda(c10::ArrayRef, c10::ScalarType, c10::optionalc10::Device, c10::optionalc10::MemoryFormat) + 0x111 (0x7f9722eadf01 in /home/XXXXXXX/Dorado/dorado-0.5.2-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #7: at::detail::empty_cuda(c10::ArrayRef, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0x31 (0x7f9722eae1d1 in /home/XXXXXXX/Dorado/dorado-0.5.2-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #8: at::native::empty_cuda(c10::ArrayRef, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0x1f (0x7f9722f555af in /home/XXXXXXX/Dorado/dorado-0.5.2-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #9: + 0xa61a339 (0x7f9724b45339 in /home/XXXXXXX/Dorado/dorado-0.5.2-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #10: + 0xa61a41b (0x7f9724b4541b in /home/XXXXXXX/Dorado/dorado-0.5.2-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #11: at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRefc10::SymInt, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0xe7 (0x7f971f8c16e7 in /home/XXXXXXX/Dorado/dorado-0.5.2-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #12: + 0x56c718f (0x7f971fbf218f in /home/XXXXXXX/Dorado/dorado-0.5.2-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #13: at::_ops::empty_memory_format::call(c10::ArrayRefc10::SymInt, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0x1b2 (0x7f971f901922 in /home/XXXXXXX/Dorado/dorado-0.5.2-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #14: dorado() [0x9a2a0f] frame #15: dorado() [0x9bf4d2] frame #16: dorado() [0x9c3b11] frame #17: dorado() [0x9a1049] frame #18: dorado() [0x9a1178] frame #19: dorado() [0x9a13d0] frame #20: dorado() [0x9a56fb] frame #21: + 0x1196e380 (0x7f972be99380 in /home/XXXXXXX/Dorado/dorado-0.5.2-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #22: + 0x81ca (0x7f9719cc91ca in /lib64/libpthread.so.0) frame #23: clone + 0x43 (0x7f9718948e73 in /lib64/libc.so.6)

Aborted (core dumped)

Feb 07 '24 19:02 ericmsmall

Those debug traces are showing that you might have other processes running on your GPUs. These can have a negative impact on the auto batch size calculation. Here we see that your ~16GiB card has only ~6GiB free.

[2024-02-07 12:01:09.205] [warning] Caught Torch error 'CUDA out of memory. Tried to allocate 8.95 GiB (GPU 0; 15.73 GiB total capacity;  ... 5.68 GiB free;

You should be able to see what else is running on your GPUs by running nvidia-smi.

Feb 08 '24 09:02 HalfPhoton

At first I thought that might be the case as well, but until dorado is started the GPUs are idle and no memory is being used. Like this:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05             Driver Version: 535.154.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A4000               Off | 00000000:65:00.0 Off |                  Off |
|  0%   37C    P0               7W / 140W |      0MiB / 16376MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A4000               Off | 00000000:B3:00.0 Off |                  Off |
|  0%   39C    P0               8W / 140W |      0MiB / 16376MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Feb 08 '24 19:02 ericmsmall

Hi @ericmsmall

Can you try setting the following environment variable and see if it helps?

PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:25

Feb 09 '24 16:02 tijyojwad

Apologies, I forgot that I had tried this based on other folks posts. Setting this as an environmental variable, or passing directly to pyTorch had no effect.

Feb 09 '24 16:02 ericmsmall

Couple of more things to try to narrow down the issue -

Try without the mod bases to see if plain basecalling shows the same problem
Try one modbase at a time (instead of both 6mA and 5mC_5hmC
manually set the batch size to something smaller, like -b 1536 or -b 1792
increase the max_split_size_mb to something larger, like 64/128

Feb 09 '24 16:02 tijyojwad

Closing as there's been no reply.

Sep 17 '24 10:09 HalfPhoton

CUDA OUT OF MEMORY (Dependent on # of GPUs)

Issue Report

Please describe the issue: CUDA OUT OF MEMORY pyTorch ERROR (Dependent on # of GPUs)

Steps to reproduce the issue:

Run environment:

Logs

SUP with batchsize 250

HAC or SUP without batchsize