NAFNet icon indicating copy to clipboard operation
NAFNet copied to clipboard

Not enough memory available to process your request

Open clasking2 opened this issue 2 years ago • 5 comments

Paste log:

Traceback (most recent call last): File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/cog/server/worker.py", line 217, in _predict result = predict(**payload) File "predict.py", line 79, in predict single_image_inference(model, inp, str(out_path)) File "predict.py", line 101, in single_image_inference model.test() File "/src/basicsr/models/image_restoration_model.py", line 247, in test pred = self.net_g(self.lq[i:j]) File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/src/basicsr/models/archs/NAFNet_arch.py", line 141, in forward x = encoder(x) File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward input = module(input) File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/src/basicsr/models/archs/NAFNet_arch.py", line 62, in forward x = self.norm1(x) File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/src/basicsr/models/archs/arch_util.py", line 300, in forward return LayerNormFunction.apply(x, self.weight, self.bias, self.eps) File "/src/basicsr/models/archs/arch_util.py", line 271, in forward var = (x - mu).pow(2).mean(1, keepdim=True) RuntimeError: CUDA out of memory. Tried to allocate 4.27 GiB (GPU 0; 14.58 GiB total capacity; 10.46 GiB already allocated; 213.31 MiB free; 13.40 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

clasking2 avatar Oct 10 '23 12:10 clasking2

Same problem. Is there any solution?

hayk-manukyan-dev avatar Nov 27 '23 16:11 hayk-manukyan-dev

Same problem. Is there any solution? I met the same question. I changed the batch size into smaller one, which is 4, soled the problem.

ZhaoYeung avatar Dec 01 '23 05:12 ZhaoYeung

Same problem. Is there any solution? I met the same question. I changed the batch size into smaller one, which is 4, soled the problem.

I trying to use test part and in .yml files there is not batch size option. Please can you help with that case

hayk-manukyan-dev avatar Dec 03 '23 17:12 hayk-manukyan-dev

Same problem. Is there any solution? I met the same question. I changed the batch size into smaller one, which is 4, soled the problem.

I trying to use test part and in .yml files there is not batch size option. Please can you help with that case

1701761162872

ZhaoYeung avatar Dec 05 '23 07:12 ZhaoYeung

Same problem. Is there any solution? I met the same question. I changed the batch size into smaller one, which is 4, soled the problem.

I trying to use test part and in .yml files there is not batch size option. Please can you help with that case

1701761162872

I thought train and test are not run each other and if change train it will have no action on test commands. Now I get you, thank you very much 👍 👍 👍

hayk-manukyan-dev avatar Dec 05 '23 21:12 hayk-manukyan-dev