Woosuk Kwon
Woosuk Kwon
cc @comaniac This PR seems to work correctly when using TP (or single GPU), but PP still generates gibberish outputs.
@comaniac @njhill @LiuXiaoxuanPKU I've update the PR with some simplification for spec decoding. PTAL.
cc @LiuXiaoxuanPKU
cc @youkaichao @bigPYJ1151 @bnellnm
@hmellor Thanks for bringing this up again. Yeah I think we don't need this PR anymore since our torch.compile integration is more mature now. I think we could turn it...
@catherinelee274 Closing this as #13210 is merged. Thanks!
@zoltan-fedor Thanks for reporting the bug. Could you please use without `--num-scheduler-steps 8`? I think there were several bug fixes on it after v0.5.5.
@tylertitsworth Can you please take a look?
@gshtras @Alexei-V-Ivanov-AMD Thanks for the PR. Do you have any performance numbers before & after this PR?