Jinhua Wang

Results 12 issues of Jinhua Wang

In the PPO implementation, it seems that the critic model considers both prompt and generated actions as the input (if pooled is true, then generated actions only). However, if we...

Hi there! When you are training with sequence parallel attention, I was wondering if you scale the loss function properly, as each GPU card will only contain a subset of...