Connor Holmes
Connor Holmes
Hi @zelcookie, I've identified that the underlying reason for the low-quality outputs is our KV-cache implementation is currently incompatible with `num_beams > 1`. When using beam search, the KV-cache associated...
Hi @hivaze, these outputs are very intriguing. Fundamentally, it's possible that we produce reasonable outputs with `num_beams>1`, but in general it's sort of lucky if it does happen. Currently, DeepSpeed-Inference...
Hi @trianxy, I'm sorry for the lack of updates on this, but with latest master (should be released as 0.7.5 in the next few days) I believe the issue you're...
> Thank you @cmikeh2 for coming back to me on that. I think the above issue can be closed, because it is fixed in versions `0.7.5+f2710bbe` BUT ALSO in `0.7.4`....
Hi @wkkautas, This PR https://github.com/microsoft/DeepSpeed/pull/2574 should fix the issue you are seeing. If you have time, please try on your end to make sure it does work as expected. Thanks!
If you are still seeing this issue, please reopen.
Thanks for the suggestion! I don't have a concrete timeline for something like this yet, but I do think this is great feature for us to support moving forward and...