Jiatong (Julius) Han

Results 220 comments of Jiatong (Julius) Han

Thanks @MikeChenfu for your followup questions. This [benchmark](https://github.com/hpcaitech/TensorNVMe#how-to-benchmark) might be potentially indicative of how useful NVME can be. But regarding the maximal stretch we may have, would @oahzxl have any...

Hi, thanks for your questions! Firstly, gradients are summed up after `allreduce()` operation and require to be divided by the number of ranks to get the mean value (Of course,...

Yep, your reasoning makes sense to me. For ring-all-reduce, I believe it is a potential improvement to have. If you are interested to contribute to this project, you may benchmark...

CUDA 12.0 may not be currently supported. Can you try reconfigure the environment to use CUDA11.x?

Can you please share the command you ran?

Please set `--nproc_per_node` to be `4`.

They are referring to vastly different concepts lol. What are you confused over? Can you give me some concrete failure samples? Otherwise, I'd suggest our official guide [here](https://colossalai.org/docs/concepts/paradigms_of_parallelism).

I believe it is only a test script and might not be intended to be fully functional. Can you try this [example](https://github.com/hpcaitech/ColossalAI/blob/v0.2.5/examples/language/gpt/gemini/train_gpt_demo.py) to test out ZeRO?

Potentially the same issue as #3041.

Can you provide more error logs?