Kun Chen

Results 5 comments of Kun Chen

在吗?这个项目还有人维护吗?

可以直接上传文件的

Yes,I have the same issue when i use the deepspeed's version of 0.14.1, so I do that: ``` pip uninstall deepspeed pip install deepspeed==0.14.0 ``` after use the deepspeed of...

> @Kwen-Chen, your input data processing looks good to me. As for your second and third questions, you need a sequence- parallel-aware loss calculation ([see example here](https://github.com/microsoft/Megatron-DeepSpeed/blob/main/megatron/core/sequence_parallel/cross_entropy.py)). thanks for your...

> When training a language model (LM) with DeepSpeed's Sequence Parallel (Ulysses), it's typical to get a cross-entropy loss for each rank. To compute the gradients accurately, as [I understand...