Jiatong (Julius) Han

Results 220 comments of Jiatong (Julius) Han

Unfortunately, there was an OOM in your machine. Two '2 3090 GPU 24GB' and your current main memory might not support training a 7-B model.

Have you tried installing from source? Or try the command `CUDA_EXT=1 pip install colossalai` to install the lib? If you have solved the issue, kindly share your approach for new...

The port might have been occupied. Can you try running with a different port number?

When using docker env to run, can you append `--network=host` to your command?

Thanks @Honee-W for sharing. I understand the issue better now. `model = model.to(torch.cuda.get_currect_device())` would suffice. Would this be useful for you @Youly172 ?

Can you try creating a docker environment with this [file](https://github.com/hpcaitech/ColossalAI/blob/main/examples/images/diffusion/docker/Dockerfile)?

Can you take a look at this issue #2487 ? Maybe it helps.

Hi, your torch version is a bit too old (1.8). Please upgrade the torch, or use the one installed in your conda environment.

Can I know the contents of your `config` file?