Reza Yazdani

Results 95 comments of Reza Yazdani

Hi @xiejw, Can you try this [PR](https://github.com/microsoft/DeepSpeed/pull/2916) using the kernels and as well as mp>1 and see if it works for you? Thanks, Reza

Hi @xiejw, Thanks for trying this out. let me try it on my side again and see if I can repro the same issue. Thanks, Reza

@xiejw, are you trying this the same way I described in the PR?

@xiejw, can you please try this again, passing `--replace_method 'auto'` when running with `inference-test.py`?

Hi @sakogan , Thanks for the PR. Sorry for my delay to get back to this. The issue that you raised here is valid and definitely needs to be addressed....

Hi @slundberg, The cache is controlled by internally by DS-inference. As long as you pass the initial context (input-prompt) to DS-inference, it will consider the previous context as the KV-Cache...

Hi @mallorbc Thanks for reporting this issue. I will try to see if I can repro this on my end. Thanks, Reza

Also, can I ask which PyTorch version are you using here?

Hi, I have fixed some bugs regarding the checkpoint loading for these model architectures. Could you please retry using this PR? You can also try our updated test-suite [here](https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/text-generation/inference-test.py). Thanks,...