Connor Holmes

Results 17 comments of Connor Holmes

Hi @zelcookie, I've identified that the underlying reason for the low-quality outputs is our KV-cache implementation is currently incompatible with `num_beams > 1`. When using beam search, the KV-cache associated...

Hi @hivaze, these outputs are very intriguing. Fundamentally, it's possible that we produce reasonable outputs with `num_beams>1`, but in general it's sort of lucky if it does happen. Currently, DeepSpeed-Inference...

Hi @trianxy, I'm sorry for the lack of updates on this, but with latest master (should be released as 0.7.5 in the next few days) I believe the issue you're...

> Thank you @cmikeh2 for coming back to me on that. I think the above issue can be closed, because it is fixed in versions `0.7.5+f2710bbe` BUT ALSO in `0.7.4`....

Hi @wkkautas, This PR https://github.com/microsoft/DeepSpeed/pull/2574 should fix the issue you are seeing. If you have time, please try on your end to make sure it does work as expected. Thanks!

Thanks for the suggestion! I don't have a concrete timeline for something like this yet, but I do think this is great feature for us to support moving forward and...