Reza Yazdani comments

Results 95 comments of


                                            Reza Yazdani

Add FALCON-40B Inference-Kernel Support

Hi everyone, I have added some changes here that can boost the loading time of this model significantly (**from 10 min to less than 15 sec**). To test this please...

Add FALCON-40B Inference-Kernel Support

I actually have a question from you guys, has anyone tested the inference of this model on [text_generation_inference](https://github.com/huggingface/text-generation-inference) system from HuggingFace?

Add FALCON-40B Inference-Kernel Support

Thanks for the feedback, it's great to see some of the downside and benefits of our pipeline and help us improve the stack. I just wanted to know if these...

Add FALCON-40B Inference-Kernel Support

Hi @dc3671, I have most of the fixes, however, I wanted to better understand the contributions I am bringing here. I will reopen this soon. Thanks, Reza

Add FALCON-40B Inference-Kernel Support

I worked a bit on this PR and added the Meta-tensor loading support. Also, Falcon-7B is runnable now. I have added a script, `test_falcon.py` that you can use to test...

[Question] How to preshard a model for tensor parallism

Hi @lanking520, Thanks for your interest in this part. I am working on bringing this feature for the rest of models. I will let you know once creating that a...

[Question] How to preshard a model for tensor parallism

Hi @lanking520, I am working on resolving this issue. I will let you know once I have the solution tested completely. Thanks, Reza

[Question] How to preshard a model for tensor parallism

Hi @lanking520, I have verified several model architectures with this PR and using this [test-suite](https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/text-generation/inference-test.py). All works fine on my side. Could you please try this on your end and...

[BUG] DS-inference possible memory duplication

Hi @mayank31398, I want to look into this. Can you please point me to the right script that I can run on my side? Thanks, Reza

[BUG] DS-inference possible memory duplication

Thanks @mayank31398 :)