llama-cpp-python
llama-cpp-python copied to clipboard
Does this lib support contrastive search decoding ?
Hi @abetlen,
I checked the parameters in both __call__ and create_completion method but did not see penalty_alpha param which represents contrastive search decoding. Can you update the decoding strategy soon @abetlen ?
@abetlen @congson1293
I checked the parameters in both call and create_completion method but not see penalty_alpha param which represent for contrastive search decoding. Can you update the decoding strategy soon?
As I understand it, the frequency_penalty and presence_penalty are what is referred to as the alpha value in the Contrastive Search paper. See these lines from the llama.cpp README:
`presence_penalty`: Repeat alpha presence penalty (default: 0.0, 0.0 = disabled).
`frequency_penalty`: Repeat alpha frequency penalty (default: 0.0, 0.0 = disabled);
If I'm not mistaken, presence_penalty is what you're looking for, but I may be misunderstanding something...
@ddh0
My understanding is that contrastive search decoding just does this at each step:
- Take all the tokens in the input and mean_pool their embeddings
- Look at the prospective next tokens and see which one has the highest cosine similarity to the mean
- Choose that token
In HF's implementation, penalty_alpha just controls how much weight it has. 0.0 would just be normal greedy search, 1.0 would be just using contrastive decoding, 0.6 is the default and it's a mixture of the two.
What's crazy is that this is, like, one of the very few, very light decoding techniques that can make a 7b param model behave like a 30b param one. There's a blog summary here but personally I didn't find it to be very helpful. It's definitely worthwhile implementing.