guidance icon indicating copy to clipboard operation
guidance copied to clipboard

Get distribution from select

Open paucodeici opened this issue 1 year ago • 5 comments

Is your feature request related to a problem? Please describe. When using select it could be good to get the distributions of the different output.

Describe the solution you'd like One could imagine something like select(options=..., name="john", return_distribution=True)

Then we could get it through lm["john.distribution"] or anything else.

Describe alternatives you've considered Right now I don't know how to do that. I can try. I see how to get the logits for the generated text thanks to _cache_state but I don't know how to get the distribution for the not choosen one (except by forcing the LLM to output each options and compute it. I think this is the only way but this is heavy without a function to wrap all this code :D)

paucodeici avatar Nov 16 '23 16:11 paucodeici

_cache_state["logits"] would be a vector of size 32000 lets say if you are considering llama2.

If I convert the options to tokens, lets say John=32, Amy=27941, etc. and then use these token IDs to index into logits, I would get a probability distribution over the options.

Question: The default temperature of the model is 0, so select should be returning the most probable option from the above distribution by default? But I ran some experiments where the answer from select was not necessarily the same as the most likely one even with temperature = 0. Why would this happen? @slundberg

iamshnoo avatar Nov 18 '23 16:11 iamshnoo

Can you share an example @iamshnoo? I think there is either a bug or (more likely) you're not accounting for token healing issues. For example, if you run lm + This is a prompt ' + select(['dog', 'cat']), you need to look at the logprobs of ' dog' given 'This is a prompt', not only 'dog' given 'This is a prompt ' (notice the whitespaces)

marcotcr avatar Nov 18 '23 17:11 marcotcr

Oh good point @marcotcr ! I was not accounting for token healing. If I look at the probabilities of ' dog' and ' cat' for your example, then I am getting 0.5 for each (is this expected?) As opposed to previously when for 'dog' and 'cat', I get different probabilities even at temp=0.

For reference, the code I am using to look at probabilities is the following:

# split_options is a list of strings with no trailing or leading whitespaces.
# prompt is in alpaca format ending in "Response:\n" with no space after the \n
# out_select is the output of select
lm = llama2 + prompt + f" {select(split_options, name='answer')}"
out_select = lm["answer"]

# out_sentence is the output of taking the max of the logits
# but note that probabilities are the same for each option, 
# so out_sentence is just the first item of option_probs dict
all_logits = lm._cache_state["logits"]
split_options = [" " + o for o in split_options] # add a leading space to each option (undo token healing)
option_tokens = [tokenizer.encode(o) for o in split_options]
option_tokens = [o[1] for o in option_tokens] # tokenizer adds <s> to each option in the above line
option_logits = [all_logits[o] for o in option_tokens]
option_probs = torch.softmax(torch.tensor(option_logits), dim=0)
option_probs = [float(o) for o in option_probs]
option_probs = dict(zip(split_options, option_probs))
out_sentence = max(option_probs, key=option_probs.get)
out_sentence = out_sentence.strip() # strip the added space

Questions:

  1. But if dog and cat are equally likely, then why does select choose one over the other? (i.e if guidance returns dog for the answer, then how can I justify that answer unless I have some difference in probability values compared to cat)
  2. Also as noted in #449 , the select call is deterministic and does not change outputs even if I pass the temperature parameter to the model as a value other than 0?

iamshnoo avatar Nov 18 '23 19:11 iamshnoo

@iamshnoo I don't understand why you get the full distribution. It works only if the different options have different first tokens leading to the fact that choosing the first token is equivalent to choosing one of the option. Does this happen often? I am not aware of exact tokens but you can imagine something like

  • option 1 = token_1 ...
  • option 2 = token_1 ...
  • option 3 = token_2 ...

In this case you need to generate every options and compute the likelihood for every option (not truly but you get my point I hope). Maybe I miss something but I really don't see how you can do it in one generation in every case.

paucodeici avatar Nov 20 '23 09:11 paucodeici

@iamshnoo In option_tokens = [o[1] for o in option_tokens], the token at position [1] is a space. tokenizer.encode(' dogs')=[1, 29871, 26361]. The space is common to all options, which is why you are getting the same probability.

MaveriQ avatar Mar 17 '24 18:03 MaveriQ