Allan Jie

Results 72 comments of Allan Jie

Yes. Removing the `--use_kernel` make it work. Yeah, I realize the DeepSpeed FastGen. Wondering, how does it support the batch size? Or I simply make a for loop about that

I'm really confused that, when you run PPO without SFT, for example, in Narrative QA? How do the (quite-small) model knows it should generate an answer?