dlwpt-code icon indicating copy to clipboard operation
dlwpt-code copied to clipboard

p2ch12 the memory explosion when i train the balanced model with the same config in p2ch11

Open icegomic opened this issue 3 years ago • 2 comments

I use the same config epoch=1, work_num=8, batch_size=32 It works well in p2ch11, the RAM is stable at 6g but when I run the code 'python -m p2ch12.training --balanced' the RAM is very high, and exceeds maximum soon, after that my computer didn't work, and I need to restart it. What happened

icegomic avatar Jun 19 '22 13:06 icegomic

I also encounter the memory explosion issue in p2ch11 when doing validation. My memory size is 32GB. I have no idea what happened.

Va6lue avatar Mar 18 '23 03:03 Va6lue

I have the same issue with the p2ch11 Code. While validating, the memory size explodes. I am also very interested in what causes this ... or why the DataLoader sometimes uses the GPU memory efficiently and sometimes floods the computer memory (RAM?!) beforehand.

Side note: After I had programmed my own architecture of the code for practicing, my memory explodes for the training and validation. I have to tweak the worker and batch size for a good run.

edit maybe this will help: https://ppwwyyxx.com/blog/2022/Demystify-RAM-Usage-in-Multiprocess-DataLoader/

donnoc avatar Feb 04 '24 20:02 donnoc