llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

Add batched inference

Open abetlen opened this issue 1 year ago • 34 comments

  • [x] Use llama_decode instead of deprecated llama_eval in Llama class
  • [ ] Implement batched inference support for generate and create_completion methods in Llama class
  • [ ] Add support for streaming / infinite completion

abetlen avatar Sep 30 '23 06:09 abetlen