guidance icon indicating copy to clipboard operation
guidance copied to clipboard

Is it possible to get `select` logprobs like in versions <0.1.0?

Open wjn0 opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe. In old (handlebar) versions one could do something like:

program = guidance("The quick brown fox jumps over the lazy {{select 'animal' options=valid_animals logprobs='animals_logprobs'}}")
returned = program(valid_animals=["dog", "cat"])
print(returned["animals_logprobs"])

to get the relative logprobs of each option (i.e. in this case a dict of length 2). Is that possible?

Describe the solution you'd like

Something similar to the above maybe.

Describe alternatives you've considered

llm._cache_state["logits"] is not the same thing -- only provides next-token logits, as I understand it.

EDIT:

Ah, just found this comment - https://github.com/guidance-ai/guidance/blob/cf355c7ac12ce7ce9ddddea115329d7ec9eeb939/guidance/_grammar.py#L489C6-L489C6 - so maybe not.

I would be interested in implementing this if the owners are amenable, but would maybe just need a little bit of guidance (ha) to better understand the repo and whether or not the <0.1 implementation should just be pulled forward or whether something new needs to happen.

wjn0 avatar Jan 15 '24 02:01 wjn0

Happy to help implement this

talglobus avatar Apr 07 '24 05:04 talglobus

@wjn0 have you figured out how to do this yet? Not sure if this is the same thing as you're asking for, but I found a hacky way that works for me using llamacpp. Tinkering around in a notebook I noticed setting compute_log_probs=True and echo=True shows the output and hovering on each token showed the log_probs. So the information is there, and since I can't find anything in the docs / too lazy to change code in the library, I simply parsed the spans from the html to get the token and prob.

def extract_spans_from_html(html_content):
    """
    Extracts all <span> elements from the given HTML content.

    Parameters:
    - html_content (str): A string containing HTML content.

    Returns:
    - list of str: A list containing the text of each <span> element found.
    """
    soup = BeautifulSoup(html_content, 'html.parser')
    spans = soup.find_all('span')
    log_dict = []
    for i in spans:
        log_dict.append({
            'text': i.text,
            'prob': i.attrs['title']
        })
    return log_dict

spans = extract_spans_from_html(lm._html())

Edit: nevermind, clearly this is not what you were looking for as it will only show the probs for the output and not the alternative.

jtha avatar Apr 10 '24 14:04 jtha