haystack-core-integrations llama.cpp: LlamaCppGenerator.run() raises 'TypeError' when passing {"stream": True}

llama.cpp: LlamaCppGenerator.run() raises 'TypeError' when passing {"stream": True}

Open paulgekeler opened this issue 1 year ago • 2 comments

Describe the bug When calling LlamaCppGenerator.run() with generation_kwargs={"stream": True}, a TypeError "'generator' object is not subscriptable" is raised in line 97: replies = [output["choices"][0]["text"]], because the create_completion function of the underlying llama-cpp-python module returns a generator object in this case.

To Reproduce Reproducable whenever run is called with generation_kwargs={"stream": True} E.g.

from haystack_integrations.components.generators.llama_cpp import LlamaCppGenerator
g = LlamaCppGenerator(model="llama.cpp/models/llama-2-7b-chat/ggml-models-Q4_K_M.gguf", n_ctx=2048, n_batch=128, model_kwargs={"verbose": False, "use_mlock": True}) # happens no matter the model_kwargs
g.warm_up()
g.run("The purpose of life is", generation_kwargs={"stream": True})

(Won't run because of the model path on my machine obvs)

Expected behaviour The underlying create_completion function returns a generator in this case. So should the run function.

Fix suggestion I guess easiest would be to return the generator object in this case.

Describe your environment (please complete the following information):

OS: Ubuntu Linux (Wsl)
Haystack version: haystack_ai-2.0.1
Integration version: llama-cpp-haystack-0.3.0

Apr 22 '24 19:04 paulgekeler

This is unfortunately expected as currently no components are capable to return a streamable object. We're working on a solution in Haystack, when ready we'll roll it out to all the integrations that will need it.

May 10 '24 05:05 masci

Thanks for replying. Yes, I figured so. For anyone interested, I have a workaround for the time being. Add the following lines in the run function of generators.py.


if "stream" in updated_generation_kwargs and updated_generation_kwargs["stream"] == True:
	return {"replies": [output], "meta": []}

Then you can iterate over the generator and retrieve each chunk as such


for answer_chunk in answer_generator:
	answer_chunk["choices"][0]["text"]

Wherever the run function is called.

May 10 '24 15:05 paulgekeler

duplicate of #730

Oct 28 '24 09:10 anakin87

haystack-core-integrations haystack-core-integrations copied to clipboard

llama.cpp: LlamaCppGenerator.run() raises 'TypeError' when passing {"stream": True}

haystack-core-integrations
haystack-core-integrations copied to clipboard