Reza Yazdani
Reza Yazdani
Hi everyone, I have added some changes here that can boost the loading time of this model significantly (**from 10 min to less than 15 sec**). To test this please...
I actually have a question from you guys, has anyone tested the inference of this model on [text_generation_inference](https://github.com/huggingface/text-generation-inference) system from HuggingFace?
Thanks for the feedback, it's great to see some of the downside and benefits of our pipeline and help us improve the stack. I just wanted to know if these...
Hi @dc3671, I have most of the fixes, however, I wanted to better understand the contributions I am bringing here. I will reopen this soon. Thanks, Reza
I worked a bit on this PR and added the Meta-tensor loading support. Also, Falcon-7B is runnable now. I have added a script, `test_falcon.py` that you can use to test...
Hi @lanking520, Thanks for your interest in this part. I am working on bringing this feature for the rest of models. I will let you know once creating that a...
Hi @lanking520, I am working on resolving this issue. I will let you know once I have the solution tested completely. Thanks, Reza
Hi @lanking520, I have verified several model architectures with this PR and using this [test-suite](https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/text-generation/inference-test.py). All works fine on my side. Could you please try this on your end and...
Hi @mayank31398, I want to look into this. Can you please point me to the right script that I can run on my side? Thanks, Reza
Thanks @mayank31398 :)