Hongxin Liu comments

Results 70 comments of


                                            Hongxin Liu

error~麻烦看一下

I've update requirements. Can you try the latest code?

Error on running Inference

Hi, do you use the original DiT pretrained weights? We modified the modeling, so the original weights cannot be loaded. You can save pretrained weights using our training script.

the gradient of all parameters is None

Hi, gradients is managed in zero optimizer and `p.grad` is None. This is expected behavior. If you want to check grads manually, you can refer to https://github.com/hpcaitech/ColossalAI/blob/7f8b16635b42013b73e1cb1ffdebc07b4d71ac93/tests/test_zero/test_low_level/test_zero1_2.py#L164 Note that the...

Will this kill the page cache?

Yes. We will add a flag to control it in the next release.

[BUG]: docker build cuda extension error

This is because docker buildkit is not compatible with current cuda extension. You can set `export FORCE_CUDA=1` before install colossalai in docker. Or you can disable docker buildkit by setting...

[FEATURE]: Expert Parallel for qwen/deepseek

EP for Deepseek V3 is implemented, see our latest [blog](https://company.hpc-ai.com/blog/shocking-release-deepseek-671b-fine-tuning-guide-revealed-unlock-the-upgraded-deepseek-suite-with-one-click-ai-players-ecstatic).

[FEATURE]: Expert Parallel for qwen/deepseek

> need FP8 training deepseek-MOE FP8 gemm kernel released by deepseek github repo now is less efficient than BF16 gemm provided by cublas sometimes. We will release blockwise FP8 training...

[colossalai/checkpoint_io/...] fix bug in load_state_dict_into_model; format error msg

We've updated environment of CI. Could you rebase the main branch and rerun the tests?

[BUG]:

The sample command is for 3x8 GPUs, but you only have 8 GPUs. Adjust the ep size or pp size to ensure they are divisible by number of GPUs.

[BUG]: 使用colossalai run会报exception: [Errno 7] Argument list too long: '/bin/bash'

It seems that all underscores are missing in your command. What's your default shell?