Connor Holmes comments

Results 17 comments of


                                            Connor Holmes

[BUG] Incorrect Model Outputs When Using Beam Search

Hi @zelcookie, I've identified that the underlying reason for the low-quality outputs is our KV-cache implementation is currently incompatible with `num_beams > 1`. When using beam search, the KV-cache associated...

[BUG] Incorrect Model Outputs When Using Beam Search

Hi @hivaze, these outputs are very intriguing. Fundamentally, it's possible that we produce reasonable outputs with `num_beams>1`, but in general it's sort of lucky if it does happen. Currently, DeepSpeed-Inference...

[BUG] DeepSpeed non-deterministic inference with HF GPT2 when `replace_with_kernel_inject=True`

Hi @trianxy, I'm sorry for the lack of updates on this, but with latest master (should be released as 0.7.5 in the next few days) I believe the issue you're...

[BUG] DeepSpeed non-deterministic inference with HF GPT2 when `replace_with_kernel_inject=True`

> Thank you @cmikeh2 for coming back to me on that. I think the above issue can be closed, because it is fixed in versions `0.7.5+f2710bbe` BUT ALSO in `0.7.4`....

[BUG] deepspeed-inference seems not working correctly with torch.half on Pascal GPU

Hi @wkkautas, This PR https://github.com/microsoft/DeepSpeed/pull/2574 should fix the issue you are seeing. If you have time, please try on your end to make sure it does work as expected. Thanks!

[BUG] deepspeed-inference seems not working correctly with torch.half on Pascal GPU

If you are still seeing this issue, please reopen.

[FastGen] Hot-swappable LoRA adapters?

Thanks for the suggestion! I don't have a concrete timeline for something like this yet, but I do think this is great feature for us to support moving forward and...