Boxiang Wang
Boxiang Wang
Hi, could you provide your training code for us to reproduce this bug? Besides, could you double-check your dataset settings?
I have tried our code with a simple change of model from resnet to shufflenet. It takes about 32521MiB with`BATCH_SIZE = 16384`, and no OOM occurred.
Hi @songyuc, you can uninstall your current `colossalai` and install our latest version with ```` git clone https://github.com/hpcaitech/ColossalAI.git cd ColossalAI # install dependency pip install -r requirements/requirements.txt # install colossalai...
Have you tried modifying [.wslconfig](https://learn.microsoft.com/en-us/windows/wsl/wsl-config) file for more memory and more processors? It works for me.
Yes, this was an NVbug about NeMo 1.0. We are not going to save .nemo in 2.0 right now
@maanug-nv Can you help approve this again? It just passed all tests.
I think this change could not be generally applied to all kinds of model loading. Maybe it should be added per customers' need