Reza Yazdani comments

Results 95 comments of


                                            Reza Yazdani

[BUG] DeepSpeed loads the whole codegen model into GPU

Hi @xiejw, Can you try this [PR](https://github.com/microsoft/DeepSpeed/pull/2916) using the kernels and as well as mp>1 and see if it works for you? Thanks, Reza

[BUG] DeepSpeed loads the whole codegen model into GPU

Hi @xiejw, Thanks for trying this out. let me try it on my side again and see if I can repro the same issue. Thanks, Reza

[BUG] DeepSpeed loads the whole codegen model into GPU

@xiejw, are you trying this the same way I described in the PR?

[BUG] DeepSpeed loads the whole codegen model into GPU

@xiejw, can you please try this again, passing `--replace_method 'auto'` when running with `inference-test.py`?

Fix a bug in the implementation of dequantization for inference

Hi @sakogan , Thanks for the PR. Sorry for my delay to get back to this. The issue that you raised here is valid and definitely needs to be addressed....

Fix a bug in the implementation of dequantization for inference

Thanks, this now makes sense 👍

[BUG] layer_past is ignored by DeepSpeedSelfAttention's compute_attention

Hi @slundberg, The cache is controlled by internally by DS-inference. As long as you pass the initial context (input-prompt) to DS-inference, it will consider the previous context as the KV-Cache...

[BUG] Can't compile DeepSpeed version 0.8.1+ with Cuda 11.7

Hi @mallorbc Thanks for reporting this issue. I will try to see if I can repro this on my end. Thanks, Reza

[BUG] Can't compile DeepSpeed version 0.8.1+ with Cuda 11.7

Also, can I ask which PyTorch version are you using here?

[BUG] Can't load OPT-30B and OPT-66B through checkpoints.json

Hi, I have fixed some bugs regarding the checkpoint loading for these model architectures. Could you please retry using this PR? You can also try our updated test-suite [here](https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/text-generation/inference-test.py). Thanks,...