Reza Yazdani

Results 95 comments of Reza Yazdani

I am glad you could run it with large batch now! :) I think this might be related to some cache allocation issues. We are working on optimizing that part...

@pai4451 Currently, I limit the token-length for each query to 128. I am gonna increase this soon. But, can you try with smaller length and see if the issue is...

Regarding the batch size, I have tried with up to 128 batch and it was working fine on my side.

Hi @mayank31398, I am still working on this. Can I ask what an average maximum number of tokens for an input would be? Potentially, this can go to as many...

Hi @mayank31398, Looking into it right now, let me first merge this to another PR. I will let you know. Thanks, Reza

Hi @xk503775229, Thanks for the interest in trying Int8 for other models. In general, you should be able to do so, however, one issue here is that you want to...

Hi @rahul003, I am able to repro this on my side using your script. However, when using mine, which is as follows, there is no issue with it: ``` import...

Hi @rahul003 , I did check again with your script and it seems the issue is regarding setting this flag `low_cpu_mem_usage` to true when creating the mode. Can you please...

Can you try with the above script that I pasted?

Also, can you please show the outputs? Thanks, Reza