Kun Chen comments

Results 5 comments of


                                            Kun Chen

[BUG] Tensors are on different devices when model.step()

Yes,I have the same issue when i use the deepspeed's version of 0.14.1, so I do that: ``` pip uninstall deepspeed pip install deepspeed==0.14.0 ``` after use the deepspeed of...

When using pure DeepSpeed ulysses and zero stage 3 to continue pre-training, the loss gap between each GPU is too large.[BUG]

> @Kwen-Chen, your input data processing looks good to me. As for your second and third questions, you need a sequence- parallel-aware loss calculation ([see example here](https://github.com/microsoft/Megatron-DeepSpeed/blob/main/megatron/core/sequence_parallel/cross_entropy.py)). thanks for your...

[BUG] Sequence Parallel(Ulysses) Training Gradient Scaling Issue

> When training a language model (LM) with DeepSpeed's Sequence Parallel (Ulysses), it's typical to get a cross-entropy loss for each rank. To compute the gradients accurately, as [I understand...

Kun Chen

deepin 下无法正常打开

有关代码

[BUG] Tensors are on different devices when model.step()

When using pure DeepSpeed ulysses and zero stage 3 to continue pre-training, the loss gap between each GPU is too large.[BUG]

[BUG] Sequence Parallel(Ulysses) Training Gradient Scaling Issue