haystack `HuggingFaceLocalGenerator` keeps generating after stopword

Describe the bug Although I set the stopwords parameter of HuggingFaceLocalGenerator to ["Original"] it keeps on generating after this token was generated. The only effect of setting the stopword is that the stopword is removed.

Expected behavior I expect the generation to stop after the stopword.

Additional context In our test case it looks like we're checking just the removal of stop words: https://github.com/deepset-ai/haystack/blob/main/test/components/generators/test_hugging_face_local_generator.py#L313

To Reproduce I am using the stopwords parameter just like in our documentation: https://docs.haystack.deepset.ai/v2.0/reference/generator-api#huggingfacelocalgenerator__init__

llm = HuggingFaceLocalGenerator("HuggingFaceH4/zephyr-7b-beta",
                                 huggingface_pipeline_kwargs={"device_map":"auto",
                                                   "model_kwargs":{"load_in_4bit":True,
                                                                   "bnb_4bit_use_double_quant":True,
                                                                   "bnb_4bit_quant_type":"nf4",
                                                                   "bnb_4bit_compute_dtype":torch.bfloat16}},
                                 generation_kwargs={"max_new_tokens": 350},
                                 stop_words=["Original"])
llm.warm_up()

And then use a template that makes the llm generate the token "Original.

FAQ Check

[x] Have you had a look at our new FAQ page?

System:

OS:
GPU/CPU:
Haystack version (commit or version number):
DocumentStore:
Reader:
Retriever:

Dec 11 '23 09:12 julian-risch

Hey @vblagoje I see that this was marked as done in the project board. Is this issue resolved?

Feb 14 '24 12:02 sjrl

@sjrl should be IIRC, let us know otherwise.

Feb 14 '24 13:02 vblagoje

@vblagoje, @sjrl , @julian-risch Hi, I am also experiencing this issue, even though I have updated my haystack-ai package to 2.0.0.

Mar 20 '24 05:03 ss2342

Would you please share your example @ss2342 ?

Mar 20 '24 13:03 vblagoje

Hi @vblagoje, I will not be able to share my example unfortunately, but it is the same exact code that is provided by the OP just with a custom model and a stop word:

llm = HuggingFaceLocalGenerator("HuggingFaceH4/zephyr-7b-beta",
                                 huggingface_pipeline_kwargs={"device_map":"auto",
                                                   "model_kwargs":{"load_in_4bit":True,
                                                                   "bnb_4bit_use_double_quant":True,
                                                                   "bnb_4bit_quant_type":"nf4",
                                                                   "bnb_4bit_compute_dtype":torch.bfloat16}},
                                 generation_kwargs={"max_new_tokens": 350},
                                 stop_words=["Original"])
llm.warm_up()

I experience the same behavior as the OP where the model simply just removes the stop_word from the generated text instead of actually stopping the generation. So for example, if my original output was something like this: This piece of art is so Original and beautiful!.

The addition of the stop_word would lead to something like this: This piece of art is so and beautiful!

I looked into the HuggingFaceLocalGenerator code and saw the following: Screenshot 2024-03-20 at 9 47 09 AM

I suspect this piece of code is causing this behavior

Mar 20 '24 13:03 ss2342

@ss2342 IIRC the necessary callbacks to capture stop words are not called from quantized models. I'll double check once again. But regardless, we should do a better job in that replace call to replace only a last word rather than bluntly iterate over all words. I'll reopen until I confirm these findings.

Mar 21 '24 08:03 vblagoje

@vblagoje for the time being, do you have any recommendations for an alternative way to have the HuggingFaceLocalGeneratorstop generating if it sees a certain word or sequence?

Mar 21 '24 13:03 ss2342

It should work with many other generators that support stop words, can you use them? Or can you somehow not use quantization? :-)

Mar 21 '24 13:03 vblagoje

@vblagoje did try running un-quantized but still experience the same behavior unfortunately.

Mar 21 '24 13:03 ss2342

Ok thanks @ss2342 I'll work on this next week until it is solved.

Mar 21 '24 15:03 vblagoje

This works for me repeatedly, verbatim example from above. Here is the notebook https://github.com/vblagoje/notebooks/blob/main/hf_stop_words_test.ipynb

Please advise @masci

May 28 '24 13:05 vblagoje

@ss2342 have a look at the notebook above, I've tried it with various stop words. Here are the use cases I test:

single stop word (single token) - country or the
multiple stop words (multi token words) - Brandenburg, Greenwich
mix simple/complex stop words - the, Greenwich

Every time I tried the LLM generation stopped on these stop words as designed.

May 28 '24 15:05 vblagoje

I'm closing this one as not reproducible. If you disagree @ss2342 please provide counter example 🙏

Jun 03 '24 08:06 vblagoje

haystack haystack copied to clipboard

`HuggingFaceLocalGenerator` keeps generating after stopword

haystack
haystack copied to clipboard