SCTNet icon indicating copy to clipboard operation
SCTNet copied to clipboard

CUDA out of Memory

Open maopengcheng924 opened this issue 1 year ago • 5 comments

I use 2 NVIDIA A100 80GB PCIe, 160GB GPU memory total, but I still got CUDA out of memory error. I didnot change ur code, just git clone, pip install and then python train. So I am confused, could u please tell me something about ur experiment env?

maopengcheng924 avatar Jul 02 '24 09:07 maopengcheng924

I use Python 3.11.9 with 2 x A100 40GB without any problems. The curious thing is that I get lower results:

0: Average PSNR_mu: 43.8158 PSNR_l: 41.5196 0: Average SSIM_mu: 0.9917 SSIM_l: 0.9884

requirements.txt

josair21 avatar Jul 03 '24 14:07 josair21

I use Python 3.11.9 with 2 x A100 40GB without any problems. The curious thing is that I get lower results:

0: Average PSNR_mu: 43.8158 PSNR_l: 41.5196 0: Average SSIM_mu: 0.9917 SSIM_l: 0.9884

requirements.txt

Hello, I would like to ask you how much batchsize is set. Why I can only set batchsize to 1 on the 80g memory A100 to run. Thank you very much

maopengcheng924 avatar Aug 23 '24 08:08 maopengcheng924

it is set to 22, running on a hpc, I make an virtual enviroment in order to run it, I attach my singularity definiton image file. pytorch_image.txt

by the way, u can check if HDR Transformers runs ok in your GPU, it is basically the same process.

josair21 avatar Aug 23 '24 13:08 josair21

I have a 4090 GPU.

And a batch_size=1 gives OOM out of memory

How to train then ?

ManuBN786 avatar Aug 14 '25 09:08 ManuBN786

I made some modifications to gen_crop_data.py to create smaller patches and now I'm able to train on my 4090 GPU

ManuBN786 avatar Aug 18 '25 09:08 ManuBN786