Hongxin Liu
Hongxin Liu
I've update requirements. Can you try the latest code?
Hi, do you use the original DiT pretrained weights? We modified the modeling, so the original weights cannot be loaded. You can save pretrained weights using our training script.
Hi, gradients is managed in zero optimizer and `p.grad` is None. This is expected behavior. If you want to check grads manually, you can refer to https://github.com/hpcaitech/ColossalAI/blob/7f8b16635b42013b73e1cb1ffdebc07b4d71ac93/tests/test_zero/test_low_level/test_zero1_2.py#L164 Note that the...
Yes. We will add a flag to control it in the next release.
This is because docker buildkit is not compatible with current cuda extension. You can set `export FORCE_CUDA=1` before install colossalai in docker. Or you can disable docker buildkit by setting...
EP for Deepseek V3 is implemented, see our latest [blog](https://company.hpc-ai.com/blog/shocking-release-deepseek-671b-fine-tuning-guide-revealed-unlock-the-upgraded-deepseek-suite-with-one-click-ai-players-ecstatic).
> need FP8 training deepseek-MOE FP8 gemm kernel released by deepseek github repo now is less efficient than BF16 gemm provided by cublas sometimes. We will release blockwise FP8 training...
We've updated environment of CI. Could you rebase the main branch and rerun the tests?
The sample command is for 3x8 GPUs, but you only have 8 GPUs. Adjust the ep size or pp size to ensure they are divisible by number of GPUs.
It seems that all underscores are missing in your command. What's your default shell?