haystack-core-integrations
haystack-core-integrations copied to clipboard
llama.cpp: LlamaCppGenerator.run() raises 'TypeError' when passing {"stream": True}
Describe the bug
When calling LlamaCppGenerator.run() with generation_kwargs={"stream": True}, a TypeError "'generator' object is not subscriptable" is raised in line 97: replies = [output["choices"][0]["text"]], because the create_completion function of the underlying llama-cpp-python module returns a generator object in this case.
To Reproduce
Reproducable whenever run is called with generation_kwargs={"stream": True}
E.g.
from haystack_integrations.components.generators.llama_cpp import LlamaCppGenerator
g = LlamaCppGenerator(model="llama.cpp/models/llama-2-7b-chat/ggml-models-Q4_K_M.gguf", n_ctx=2048, n_batch=128, model_kwargs={"verbose": False, "use_mlock": True}) # happens no matter the model_kwargs
g.warm_up()
g.run("The purpose of life is", generation_kwargs={"stream": True})
(Won't run because of the model path on my machine obvs)
Expected behaviour
The underlying create_completion function returns a generator in this case. So should the run function.
Fix suggestion I guess easiest would be to return the generator object in this case.
Describe your environment (please complete the following information):
- OS: Ubuntu Linux (Wsl)
- Haystack version: haystack_ai-2.0.1
- Integration version: llama-cpp-haystack-0.3.0
This is unfortunately expected as currently no components are capable to return a streamable object. We're working on a solution in Haystack, when ready we'll roll it out to all the integrations that will need it.
Thanks for replying. Yes, I figured so. For anyone interested, I have a workaround for the time being. Add the following lines in the run function of generators.py.
if "stream" in updated_generation_kwargs and updated_generation_kwargs["stream"] == True:
return {"replies": [output], "meta": []}
Then you can iterate over the generator and retrieve each chunk as such
for answer_chunk in answer_generator:
answer_chunk["choices"][0]["text"]
Wherever the run function is called.
duplicate of #730