Real-ESRGAN icon indicating copy to clipboard operation
Real-ESRGAN copied to clipboard

CUDA out of Memory, finetuning RealESRGAN_4X

Open SanKumSan opened this issue 1 year ago • 18 comments

2022-08-06 19:37:37,657 INFO: Loss [PerceptualLoss] is created. 2022-08-06 19:37:37,684 INFO: Loss [GANLoss] is created. 2022-08-06 19:37:37,718 INFO: Model [RealESRGANModel] is created. 2022-08-06 19:37:38,038 INFO: Start training from epoch: 0, iter: 0 Traceback (most recent call last):

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 5.27 GiB already allocated; 0 bytes free; 5.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Do I really need minimun of 20GB GPU Ram to get started. ? Thanks

SanKumSan avatar Aug 06 '22 17:08 SanKumSan

lower batch size in your configuration file to 1 and num worker gpus to 2

cliffordkleinsr avatar Aug 11 '22 09:08 cliffordkleinsr

hey im having this issue as well where can I find the configuration file

mllyons2002 avatar Sep 12 '22 05:09 mllyons2002

hey im having this issue as well where can I find the configuration file

In the Real-Esrgan folder go to the options directory, and you should see an fine-tune Real-Esrgan.yml file, that's it

cliffordkleinsr avatar Sep 12 '22 05:09 cliffordkleinsr

Hey so I just changed the settings now am I supposed to restart my device or should it work right off the bat because im still getting the same issue

mllyons2002 avatar Sep 12 '22 05:09 mllyons2002

Yup,Lmk if it works

cliffordkleinsr avatar Sep 12 '22 06:09 cliffordkleinsr

Restarted my device and im still having the same issue

mllyons2002 avatar Sep 12 '22 06:09 mllyons2002

Says I should lower the —tile but ive zero clue where it is or where to do it

mllyons2002 avatar Sep 12 '22 06:09 mllyons2002

What GPU are you using, what is the resolution of your dataset

cliffordkleinsr avatar Sep 12 '22 06:09 cliffordkleinsr

Rtx 3080 16gb not sure by what you mean of resolution of dataset however

mllyons2002 avatar Sep 12 '22 06:09 mllyons2002

Rtx 3080 16gb not sure by what you mean of resolution of dataset however

The dimensions of the pictures in your dataset, like 512 x 512

cliffordkleinsr avatar Sep 12 '22 06:09 cliffordkleinsr

Idk 😭

mllyons2002 avatar Sep 12 '22 06:09 mllyons2002

Wait i can check that nvm lmfao

mllyons2002 avatar Sep 12 '22 06:09 mllyons2002

:p

cliffordkleinsr avatar Sep 12 '22 06:09 cliffordkleinsr

2048x2048

mllyons2002 avatar Sep 12 '22 06:09 mllyons2002

Also is there a reason why it allocates 13gbs of vram lmfao

mllyons2002 avatar Sep 12 '22 06:09 mllyons2002

It's to do with torch batch memory growth, it isn't well handled as compared to tensorflow, maybe give me a chat over at discord, username: clidegamer254#2034

cliffordkleinsr avatar Sep 12 '22 06:09 cliffordkleinsr

Let's see if we can resolve that issue :p

cliffordkleinsr avatar Sep 12 '22 06:09 cliffordkleinsr

Added right now

On Mon, Sep 12, 2022 at 01:35 Cliff Njoroge @.***> wrote:

Let's see if we can resolve that issue :p

— Reply to this email directly, view it on GitHub https://github.com/xinntao/Real-ESRGAN/issues/403#issuecomment-1243281462, or unsubscribe https://github.com/notifications/unsubscribe-auth/APFAGPYMKYNX53OI6LFGFLLV53FLPANCNFSM55ZCQQLA . You are receiving this because you commented.Message ID: @.***>

mllyons2002 avatar Sep 12 '22 06:09 mllyons2002