Boxiang Wang

Results 8 comments of Boxiang Wang

Hi, could you provide your training code for us to reproduce this bug? Besides, could you double-check your dataset settings?

I have tried our code with a simple change of model from resnet to shufflenet. It takes about 32521MiB with`BATCH_SIZE = 16384`, and no OOM occurred.

Hi @songyuc, you can uninstall your current `colossalai` and install our latest version with ```` git clone https://github.com/hpcaitech/ColossalAI.git cd ColossalAI # install dependency pip install -r requirements/requirements.txt # install colossalai...

Have you tried modifying [.wslconfig](https://learn.microsoft.com/en-us/windows/wsl/wsl-config) file for more memory and more processors? It works for me.

Yes, this was an NVbug about NeMo 1.0. We are not going to save .nemo in 2.0 right now

@maanug-nv Can you help approve this again? It just passed all tests.

I think this change could not be generally applied to all kinds of model loading. Maybe it should be added per customers' need