llama-cpp-python
llama-cpp-python copied to clipboard

Published 20 hours ago •

Reame
Issues

Add batched inference

Open abetlen opened this issue 1 year ago • 34 comments

[x] Use llama_decode instead of deprecated llama_eval in Llama class
[ ] Implement batched inference support for generate and create_completion methods in Llama class
[ ] Add support for streaming / infinite completion

Sep 30 '23 06:09 abetlen