Jiatong (Julius) Han

Results 218 comments of Jiatong (Julius) Han

Can you upgrade your pytorch to 1.13 (ideally via `conda`) as per [this](https://pytorch.org/blog/deprecation-cuda-python-support/)?

So @Aadedd as a summary, cuda11.7 + torch11.3 worked for you?

Can you try reinstalling PyTorch to 1.13?

Hi, can you provide your environment settings via colossalai -i ?

Hi, sorry for getting to this late. Would this issue https://github.com/hpcaitech/ColossalAI/issues/3496 be any helpful?

Can you share your environment settings and try adding `--network=host` to your training command?

Hi, can you remove `--runtime=nvidia` and try again? And take a look at this [post](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).

Actually I cannot replicate your issue. Would you try as per [this](https://github.com/hpcaitech/ColossalAI#build-on-your-own)?

Sorry @captainst , I believe this example can only run with 4 GPUs. Can you try modifying [this line](https://github.com/hpcaitech/ColossalAI/blob/dbc01b9c0479a6fd3fb04450b9dc01b5162d8c0d/examples/tutorial/auto_parallel/auto_parallel_with_resnet.py#L28) to cater to your own case with only 2 GPUs?