Real-ESRGAN icon indicating copy to clipboard operation
Real-ESRGAN copied to clipboard

Cuda out of memory when fine tunning

Open HenryKang1 opened this issue 1 year ago • 1 comments

First My image size is 512 x 512. I set my scale as 1 and I also set my gt size as 512. I change the crop 400 padding stuff and set it as 512. The batch size is 1. If I train this with scratch it as "SRVGGNetCompact" architecture. However when I train this as fine tunning it has below error. Any solution? I also can not train the ESRNET also because of CUDA error. I tested the 48 GB GPU memor with batch size 1. IT has still same issue.

raceback (most recent call last): File "realesrgan/train.py", line 11, in train_pipeline(root_path) File "C:\Users\dongh.conda\envs\basicsr\lib\site-packages\basicsr\train.py", line 169, in train_pipeline model.optimize_parameters(current_iter) File "d:\sr_code\real-esrgan\realesrgan\models\realesrgan_model.py", line 210, in optimize_parameters self.output = self.net_g(self.lq) File "C:\Users\dongh.conda\envs\basicsr\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\dongh.conda\envs\basicsr\lib\site-packages\basicsr\archs\rrdbnet_arch.py", line 113, in forward body_feat = self.conv_body(self.body(feat)) File "C:\Users\dongh.conda\envs\basicsr\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\dongh.conda\envs\basicsr\lib\site-packages\torch\nn\modules\container.py", line 139, in forward input = module(input) File "C:\Users\dongh.conda\envs\basicsr\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\dongh.conda\envs\basicsr\lib\site-packages\basicsr\archs\rrdbnet_arch.py", line 59, in forward out = self.rdb1(x) File "C:\Users\dongh.conda\envs\basicsr\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\dongh.conda\envs\basicsr\lib\site-packages\basicsr\archs\rrdbnet_arch.py", line 35, in forward x3 = self.lrelu(self.conv3(torch.cat((x, x1, x2), 1))) RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 24.00 GiB total capacity; 23.00 GiB already allocated; 0 bytes free; 23.09 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

HenryKang1 avatar Apr 28 '23 20:04 HenryKang1

have you found any solution for this error ?

dummyuser-123 avatar Feb 09 '24 04:02 dummyuser-123