Jiatong (Julius) Han comments

Results 220 comments of


                                            Jiatong (Julius) Han

Small memory saving on NVME[BUG]:

Thanks @MikeChenfu for your followup questions. This [benchmark](https://github.com/hpcaitech/TensorNVMe#how-to-benchmark) might be potentially indicative of how useful NVME can be. But regarding the maximal stretch we may have, would @oahzxl have any...

The divisible factor of local world size in sequence parallelism.

Hi, thanks for your questions! Firstly, gradients are summed up after `allreduce()` operation and require to be divided by the number of ranks to get the mean value (Of course,...

The divisible factor of local world size in sequence parallelism.

Yep, your reasoning makes sense to me. For ring-all-reduce, I believe it is a potential improvement to have. If you are interested to contribute to this project, you may benchmark...

ModuleNotFoundError: No module named 'colossalai._C.cpu_adam'

CUDA 12.0 may not be currently supported. Can you try reconfigure the environment to use CUDA11.x?

[BUG]: start titan example too slow

Can you please share the command you ran?

[BUG]: start titan example too slow

Please set `--nproc_per_node` to be `4`.

[BUG]: start titan example too slow

They are referring to vastly different concepts lol. What are you confused over? Can you give me some concrete failure samples? Otherwise, I'd suggest our official guide [here](https://colossalai.org/docs/concepts/paradigms_of_parallelism).

[BUG]: ZeRO无法使用预训练权重

I believe it is only a test script and might not be intended to be fully functional. Can you try this [example](https://github.com/hpcaitech/ColossalAI/blob/v0.2.5/examples/language/gpt/gemini/train_gpt_demo.py) to test out ZeRO?

No module named 'colossalai._C.fused_optim'

Potentially the same issue as #3041.

No module named 'colossalai._C.fused_optim'

Can you provide more error logs?