Woosuk Kwon

Results 284 comments of Woosuk Kwon

cc @comaniac This PR seems to work correctly when using TP (or single GPU), but PP still generates gibberish outputs.

@comaniac @njhill @LiuXiaoxuanPKU I've update the PR with some simplification for spec decoding. PTAL.

cc @youkaichao @bigPYJ1151 @bnellnm

@hmellor Thanks for bringing this up again. Yeah I think we don't need this PR anymore since our torch.compile integration is more mature now. I think we could turn it...

@catherinelee274 Closing this as #13210 is merged. Thanks!

@zoltan-fedor Thanks for reporting the bug. Could you please use without `--num-scheduler-steps 8`? I think there were several bug fixes on it after v0.5.5.

@tylertitsworth Can you please take a look?

@gshtras @Alexei-V-Ivanov-AMD Thanks for the PR. Do you have any performance numbers before & after this PR?