zekai

Results 4 issues of zekai

I notice the training code stops me from using fp32 and Zero2, which can be further attributed to update_ema() in train_utils.py ``` if param.data.dtype != torch.float32 and isinstance(optimizer, LowLevelZeroOptimizer): param_id...

Thank you for your great work :) ! Here is my question: I tried to follow the instructions but failed in the flash-attention related steps. According to [this issue](https://github.com/Dao-AILab/flash-attention/issues/148), V100...

question

Hi, I read the deepspeed docs and have the following confusion: (1) What's the difference between these methods when in inferencing LLMs? a. deepspeed.initialize and then write code to generate...

When I trained on several objects with several epochs, the commit loss starts to become negative, and it turns out that the overall loss keeps going down, but neither the...