llama-cpp-python Support for a limited vocabulary for generation

Is your feature request related to a problem? Please describe. I would like to constrain the model output to only use a custom vocabulary comprising a list of allowable words (or alternatively, to blacklist all other words in the vocabulary).

Describe the solution you'd like HuggingFace's transformer library features a bad_words_id keyword in the model.generate function that accepts a list of words to exclude from its output (some discussion of this feature here).

Describe alternatives you've considered Could this possibly be achieved with the use of a llama_cpp.LogitsProcessor? I am less familiar with this library and haven't found examples in a similar direction, so am unsure how straightforward this could be to implement using one of those.

Dec 11 '23 20:12 mgorenstein

You can do this now by streaming the response and excluding words yourself. E.g.:

stream = llm(
    "Don't say any bad words:",
    stream=True,
    echo=True
)
response = ""
for token in stream:
    choice = token['choices'][0]
    token = choice["text"]
    if token in BAD_WORDS_LIST:
        continue
    response += token

print("The LLM response w/o bad words:", response)

Dec 13 '23 17:12 brandonrobertz

Hi @brandonrobertz thanks for this suggestion!

I've modified my issue to be a bit clearer - basically I'd want to bias or constrain beam search so that the 'bad words' don't appear in subsequences during generation (or alternatively, to only allow specific words during generation), rather than filtering them out from a completed output.

Dec 14 '23 00:12 mgorenstein

Hi @brandonrobertz thanks for this suggestion!

I've modified my issue to be a bit clearer - basically I'd want to bias or constrain beam search so that the 'bad words' don't appear in subsequences during generation (or alternatively, to only allow specific words during generation), rather than filtering them out from a completed output.

I see. So you want an actual custom token sampler. If that's the case then you'd need to actually add your own in llama.cpp (or modify an existing one). Here's where the top_k sampler is (I suppose you could modify this and use a custom llama.cpp in vendor/llama.cpp):

https://github.com/ggerganov/llama.cpp/blob/948ff137ec37f1ec74c02905917fa0afc9b97514/llama.cpp#L7364-L7387

This library really just wraps llama.cpp and doesn't provide its own samplers and whatnot, AFAICT.

Dec 14 '23 01:12 brandonrobertz

@mgorenstein you can also do this with logit_bias or a custom LogitsProcessor, however this only works on the token level so it's not perfect.

Dec 21 '23 19:12 abetlen

Thanks @abetlen, LogitsProcessor is the approach I ended up taking. Have a partially working solution that I set aside, will report back when it's further along.

Dec 21 '23 21:12 mgorenstein

bad_word_ids accept ngrams, you could've just tokenized your rejected word list

Apr 18 '24 04:04 JulesGM

did you look into using transformers.Constraint?

Apr 18 '24 04:04 JulesGM

llama-cpp-python llama-cpp-python copied to clipboard

Support for a limited vocabulary for generation

llama-cpp-python
llama-cpp-python copied to clipboard