optillm supports logits or logprobs in the API

Open codelion opened this issue 7 months ago • 1 comments

Add documentation to show how to use optillm with local inference server for getting logits.

This is a commonly requested feature in ollama https://github.com/ollama/ollama/issues/2415 that is already supported in optillm and works well.

Apr 28 '25 14:04 codelion

The following code shows how to use the OpenAI compatible API to get logprobs from optillm:

messages=[
    {
      "role": "user",
      "content": "How many rs are there in strawberry? Use code to solve the problem."}
  ]
  
response = client.chat.completions.create(
  model =   model = "meta-llama/Llama-3.2-1B-Instruct",
  messages=messages,
  temperature=0.6,
  logprobs = True,
  top_logprobs = 3,
)
print(response.choices[0].message.content)

print(json.dumps(response.choices[0].message.logprobs, indent=2))

This will output the logprobs for the top 3 tokens at every token in the standard OpenAI logprobs object (https://platform.openai.com/docs/api-reference/chat/create#chat-create-logprobs)

You can also take that JSON output and paste it in the LogProbsVisualizer - https://huggingface.co/spaces/codelion/LogProbsVisualizer to generate charts and analyze them Screenshot 2025-03-05 at 6 26 18 PM

Apr 28 '25 14:04 codelion