Reza Yazdani
Reza Yazdani
I am glad you could run it with large batch now! :) I think this might be related to some cache allocation issues. We are working on optimizing that part...
@pai4451 Currently, I limit the token-length for each query to 128. I am gonna increase this soon. But, can you try with smaller length and see if the issue is...
Regarding the batch size, I have tried with up to 128 batch and it was working fine on my side.
Hi @mayank31398, I am still working on this. Can I ask what an average maximum number of tokens for an input would be? Potentially, this can go to as many...
Hi @mayank31398, Looking into it right now, let me first merge this to another PR. I will let you know. Thanks, Reza
Hi @xk503775229, Thanks for the interest in trying Int8 for other models. In general, you should be able to do so, however, one issue here is that you want to...
Hi @rahul003, I am able to repro this on my side using your script. However, when using mine, which is as follows, there is no issue with it: ``` import...
Hi @rahul003 , I did check again with your script and it seems the issue is regarding setting this flag `low_cpu_mem_usage` to true when creating the mode. Can you please...
Can you try with the above script that I pasted?
Also, can you please show the outputs? Thanks, Reza