PENG WENJIE

Results 4 comments of PENG WENJIE

It doesn't seem to work. Reasons: 1) Inference time is the same as a single inference, 2) console warnings appear one by one, it can be inferred that the model...

Thank you for your detailed explanation @Rocketknight1 . I have started using the vllm method, which enables efficient inference. But I'll try to use the model.generate() method for batch generation....

Sure, This is a website for your reference: https://docs.vllm.ai/en/stable/getting_started/quickstart.html. I find that vllm seems to be inferior to transformers method in batch inference. Maybe there is something wrong with my...

I read the issue and tried your code, which worked perfectly. Thank you for your contribution