haystack
haystack copied to clipboard
`HuggingFaceLocalGenerator` keeps generating after stopword
Describe the bug
Although I set the stopwords
parameter of HuggingFaceLocalGenerator
to ["Original"]
it keeps on generating after this token was generated. The only effect of setting the stopword is that the stopword is removed.
Expected behavior I expect the generation to stop after the stopword.
Additional context In our test case it looks like we're checking just the removal of stop words: https://github.com/deepset-ai/haystack/blob/main/test/components/generators/test_hugging_face_local_generator.py#L313
To Reproduce
I am using the stopwords
parameter just like in our documentation: https://docs.haystack.deepset.ai/v2.0/reference/generator-api#huggingfacelocalgenerator__init__
llm = HuggingFaceLocalGenerator("HuggingFaceH4/zephyr-7b-beta",
huggingface_pipeline_kwargs={"device_map":"auto",
"model_kwargs":{"load_in_4bit":True,
"bnb_4bit_use_double_quant":True,
"bnb_4bit_quant_type":"nf4",
"bnb_4bit_compute_dtype":torch.bfloat16}},
generation_kwargs={"max_new_tokens": 350},
stop_words=["Original"])
llm.warm_up()
And then use a template that makes the llm generate the token "Original.
FAQ Check
- [x] Have you had a look at our new FAQ page?
System:
- OS:
- GPU/CPU:
- Haystack version (commit or version number):
- DocumentStore:
- Reader:
- Retriever:
Hey @vblagoje I see that this was marked as done in the project board. Is this issue resolved?
@sjrl should be IIRC, let us know otherwise.
@vblagoje, @sjrl , @julian-risch Hi, I am also experiencing this issue, even though I have updated my haystack-ai
package to 2.0.0.
Would you please share your example @ss2342 ?
Hi @vblagoje, I will not be able to share my example unfortunately, but it is the same exact code that is provided by the OP just with a custom model and a stop word:
llm = HuggingFaceLocalGenerator("HuggingFaceH4/zephyr-7b-beta",
huggingface_pipeline_kwargs={"device_map":"auto",
"model_kwargs":{"load_in_4bit":True,
"bnb_4bit_use_double_quant":True,
"bnb_4bit_quant_type":"nf4",
"bnb_4bit_compute_dtype":torch.bfloat16}},
generation_kwargs={"max_new_tokens": 350},
stop_words=["Original"])
llm.warm_up()
I experience the same behavior as the OP where the model simply just removes the stop_word
from the generated text instead of actually stopping the generation.
So for example, if my original output was something like this:
This piece of art is so Original and beautiful!
.
The addition of the stop_word would lead to something like this:
This piece of art is so and beautiful!
I looked into the HuggingFaceLocalGenerator
code and saw the following:
I suspect this piece of code is causing this behavior
@ss2342 IIRC the necessary callbacks to capture stop words are not called from quantized models. I'll double check once again. But regardless, we should do a better job in that replace call to replace only a last word rather than bluntly iterate over all words. I'll reopen until I confirm these findings.
@vblagoje for the time being, do you have any recommendations for an alternative way to have the HuggingFaceLocalGenerator
stop generating if it sees a certain word or sequence?
It should work with many other generators that support stop words, can you use them? Or can you somehow not use quantization? :-)
@vblagoje did try running un-quantized but still experience the same behavior unfortunately.
Ok thanks @ss2342 I'll work on this next week until it is solved.
This works for me repeatedly, verbatim example from above. Here is the notebook https://github.com/vblagoje/notebooks/blob/main/hf_stop_words_test.ipynb
Please advise @masci
@ss2342 have a look at the notebook above, I've tried it with various stop words. Here are the use cases I test:
- single stop word (single token) -
country
orthe
- multiple stop words (multi token words) -
Brandenburg
,Greenwich
- mix simple/complex stop words -
the
,Greenwich
Every time I tried the LLM generation stopped on these stop words as designed.
I'm closing this one as not reproducible. If you disagree @ss2342 please provide counter example 🙏