llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

Does this lib support contrastive search decoding ?

Open congson1293 opened this issue 1 year ago • 2 comments

Hi @abetlen, I checked the parameters in both __call__ and create_completion method but did not see penalty_alpha param which represents contrastive search decoding. Can you update the decoding strategy soon @abetlen ?

congson1293 avatar Mar 05 '24 03:03 congson1293

@abetlen @congson1293

I checked the parameters in both call and create_completion method but not see penalty_alpha param which represent for contrastive search decoding. Can you update the decoding strategy soon?

As I understand it, the frequency_penalty and presence_penalty are what is referred to as the alpha value in the Contrastive Search paper. See these lines from the llama.cpp README:

    `presence_penalty`: Repeat alpha presence penalty (default: 0.0, 0.0 = disabled).

    `frequency_penalty`: Repeat alpha frequency penalty (default: 0.0, 0.0 = disabled);

If I'm not mistaken, presence_penalty is what you're looking for, but I may be misunderstanding something...

ddh0 avatar Mar 23 '24 20:03 ddh0

@ddh0

My understanding is that contrastive search decoding just does this at each step:

  1. Take all the tokens in the input and mean_pool their embeddings
  2. Look at the prospective next tokens and see which one has the highest cosine similarity to the mean
  3. Choose that token

In HF's implementation, penalty_alpha just controls how much weight it has. 0.0 would just be normal greedy search, 1.0 would be just using contrastive decoding, 0.6 is the default and it's a mixture of the two.

What's crazy is that this is, like, one of the very few, very light decoding techniques that can make a 7b param model behave like a 30b param one. There's a blog summary here but personally I didn't find it to be very helpful. It's definitely worthwhile implementing.

ckoshka avatar May 13 '24 02:05 ckoshka