DeepSpeedExamples
DeepSpeedExamples copied to clipboard
[chat] generate process is not a single step in RL
Hi I am from the ColossalAI team. I found that there are similarities between DeepSpeedChat and ColossalChat. We found that there might be some implementation error in our code, thus possibly leading to some convergence problems. If you refer to our implementation, you may encounter similar problems as well. We are planning to fix the bug these two weeks.
Meanwhile, it would be appreciated if you could reference our work if you adapted from our implementation.