LsEmpire comments

Repositories
Issues
Comments

Results 3 comments of


                                            LsEmpire

Speed issue. It is slow, seems like one sec one token generated

> If you're running inference on CPU, you should expect the slower speed, if you're running on a GPU, the generation is much faster Thanks, thanks for your help and...

Speed issue. It is slow, seems like one sec one token generated

> Thanks, thanks for your reply. I see the code is from generate_stream function in file inference.py Could you please help again to check if it is the 0 -...

Speed issue. It is slow, seems like one sec one token generated

Thanks > Hi, as you are already aware, if you use the worker for the generation, you get the streaming output. If you need all the outputs at once, you...