Jiatong (Julius) Han
Jiatong (Julius) Han
Can you upgrade your pytorch to 1.13 (ideally via `conda`) as per [this](https://pytorch.org/blog/deprecation-cuda-python-support/)?
So @Aadedd as a summary, cuda11.7 + torch11.3 worked for you?
Can you try reinstalling PyTorch to 1.13?
Hi, can you provide your environment settings via colossalai -i ?
Hi, sorry for getting to this late. Would this issue https://github.com/hpcaitech/ColossalAI/issues/3496 be any helpful?
@chingfeng2021, has the issue been resolved?
Can you share your environment settings and try adding `--network=host` to your training command?
Hi, can you remove `--runtime=nvidia` and try again? And take a look at this [post](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).
Actually I cannot replicate your issue. Would you try as per [this](https://github.com/hpcaitech/ColossalAI#build-on-your-own)?
Sorry @captainst , I believe this example can only run with 4 GPUs. Can you try modifying [this line](https://github.com/hpcaitech/ColossalAI/blob/dbc01b9c0479a6fd3fb04450b9dc01b5162d8c0d/examples/tutorial/auto_parallel/auto_parallel_with_resnet.py#L28) to cater to your own case with only 2 GPUs?