llama-cpp-python
llama-cpp-python copied to clipboard
Add batched inference
- [x] Use
llama_decode
instead of deprecatedllama_eval
inLlama
class - [ ] Implement batched inference support for
generate
andcreate_completion
methods inLlama
class - [ ] Add support for streaming / infinite completion