optillm supports logits or logprobs in the API
Add documentation to show how to use optillm with local inference server for getting logits.
This is a commonly requested feature in ollama https://github.com/ollama/ollama/issues/2415 that is already supported in optillm and works well.
The following code shows how to use the OpenAI compatible API to get logprobs from optillm:
messages=[
{
"role": "user",
"content": "How many rs are there in strawberry? Use code to solve the problem."}
]
response = client.chat.completions.create(
model = model = "meta-llama/Llama-3.2-1B-Instruct",
messages=messages,
temperature=0.6,
logprobs = True,
top_logprobs = 3,
)
print(response.choices[0].message.content)
print(json.dumps(response.choices[0].message.logprobs, indent=2))
This will output the logprobs for the top 3 tokens at every token in the standard OpenAI logprobs object (https://platform.openai.com/docs/api-reference/chat/create#chat-create-logprobs)
You can also take that JSON output and paste it in the LogProbsVisualizer - https://huggingface.co/spaces/codelion/LogProbsVisualizer to generate charts and analyze them