chatcraft.org
chatcraft.org copied to clipboard
include reasoning tokens in ui
deepseek returns https://api-docs.deepseek.com/guides/reasoning_model reasoning tokens. We should use html details/summary feature for this
openrouter is gonna support this for all reasoning models
this is gonna be interesting for also explicitly including reasoning context when switching models to do function calls, etc that reasoning models suck at
Could you elaborate on this one, I would like to learn the entire idea 💡
We can render messages using https://developer.mozilla.org/en-US/docs/Web/HTML/Element/details
<details>
|reasoning_content from r1 model(we dont parse these out atm)|
<summary>
content from r1(same as what we see now)
</summary>
</details
Eg
content from r1(same as what we see now)
@tarasglek sounds interesting, may I try?
Just now found time to try new DeepSeek, their deep thinking option is impressive, way better than OpenAI's o1 model has.
Here's my proposal:
- Add a checkbox or something like this, therefore, user could see reasoning tokens
- If checkbox unchecked, user receives regular fast response without deep thinking
Questions: I assume that isn't support by every model, how may I release it since user may choose any desirable model?
I would put any UI for selecting/de-selecting this into the Preferences Modal vs. adding to the prompt area, which is already too busy.
I think we should just add a thinking feature to our data model. Eg add a .reasoning_content like they do to our messages and when that's present show that in ui during streaming and collapse it afterward.
This has been requested again on Discord recently:
Basically, there's no evidence of reasoning happening (or to set reasoning budget/amount/thresholds) so I'm not sure that reasoning is actually occurring. But yeah, basically, to show whenever it reasons in the UI so folks are aware of it. Basically, I don't have confidence that reasoning is happening under the hood, especially sonnet-4-5's threaded stuff
See https://openrouter.ai/docs/api-reference/responses-api/reasoning and we should figure out the right way to pull these reasoning messages out as it streams and show.
Some more background from Claude:
Streaming Reasoning Messages in Chat Completions
When streaming chat completions with reasoning models (like o1), the reasoning content is returned through delta chunks in the stream, similar to regular message content.
Stream Structure
// Each chunk in the stream has this structure
{
id: "chatcmpl-...",
object: "chat.completion.chunk",
created: 1234567890,
model: "o1-preview",
choices: [{
index: 0,
delta: {
reasoning_content: "Let me think about this...",
content: "The answer is..."
},
finish_reason: null
}]
}
Key Points
-
Two separate fields: Reasoning appears in
delta.reasoning_contentwhile the final answer appears indelta.content -
Incremental delivery: Both fields stream incrementally, token by token, just like regular streaming responses
-
Finish reason: When complete, you'll see
finish_reason: "stop"orfinish_reason: "length"
Example: Processing Streamed Reasoning
async function streamWithReasoning(messages) {
const stream = await openai.chat.completions.create({
model: "o1-preview",
messages: messages,
stream: true
});
let reasoning = "";
let content = "";
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta;
if (delta?.reasoning_content) {
reasoning += delta.reasoning_content;
console.log("Reasoning:", delta.reasoning_content);
}
if (delta?.content) {
content += delta.content;
console.log("Content:", delta.content);
}
}
return { reasoning, content };
}
The reasoning tokens are not counted toward your output token usage—only
the final content tokens are billed.
So perhaps when we render the streaming message, we can have a separate area for the reasoning in the UI. Maybe we do this as @tarasglek suggests with summary and details, or maybe we split the parent node of the message content in the React component such that we can put the reasoning portion into another slot that's collapsed (when fully rendered and complete) or showing (as it streams in).
This needs some thought. First task would be to figure out how to get at the reasoning_content in the streaming chunks, maybe in https://github.com/tarasglek/chatcraft.org/blob/main/src/lib/ai.ts#L38-L59.
After that, update https://github.com/tarasglek/chatcraft.org/blob/main/src/lib/ai.ts#L205 to allow passing that extra content in such a way that it can get passed through to the UI.
A bunch more needs to happen, but this would be a good place to begin research.
See also https://platform.openai.com/docs/guides/reasoning
Hey folks. Just as a UAT in terms of the interleaved anthropic style reasoning. The first screenshot is a failure. (The study comes from medarxiv https://www.medrxiv.org/content/10.1101/2024.10.07.24314963v1.
Prompt:
Once you have generated your slide, review it and make sure that the text on the slide can fit a standard high level powerpoint slide, suggest edits and then make a second version.```
(This is a bad prompt, but just one designed to make it interleave thinking and doing.)
<img width="1524" height="730" alt="Image" src="https://github.com/user-attachments/assets/6a06c304-b840-4905-b233-3cc68067125b" />is what their platform shows.
https://claude.ai/share/a2f55986-ab8d-4e58-ad76-8b51fb0c07c4 is the same thing on Claude.
It is the thinking -> doing -> thinking loop that will be useful to show to students.
The reasoning tokens are not counted toward your output token usage—only the final content tokens are billed.
This bit seems inaccurate based on the official docs.
Edit: Also found this reading a bit further. I guess it's different for chat completions.
Ok did some more reading and seems like chat completions api does not return reasoning tokens/summaries in responses.
Sample chat completion with a reasoning model
{
"id": "chatcmpl-CZ7XYzx58pkJjk9JAlCaXcIQWbwz2",
"object": "chat.completion",
"created": 1762486500,
"model": "o1-2024-12-17",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The Chat Completions API does not include its internal “chain-of-thought” or reasoning\ntokens in its output by default. It provides a natural language response (the answer\nportion you see), but it does not share the hidden reasoning steps used to generate\nthat response. You can, of course, prompt the model to provide summaries or step-by-step\nsolutions in the text of its reply, but there is no separate “reasoning token” stream\navailable from the API.",
"refusal": null,
"annotations": []
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 221,
"completion_tokens": 1200,
"total_tokens": 1421,
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 1088,
"audio_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"service_tier": "default",
"system_fingerprint": "fp_f05e0cf0c6"
}
But I also see that the conversation started specifically with DeepSeek reasoning models supporting this feature via chat completions interface. Does that mean we are aiming to support this feature just for DeepSeek models? If so, doesn't it go against ChatCraft's philosophy of broad provider support?
And if we are aiming to add support for most reasoning models, which would be through standard OpenAI interface, wouldn't we have to migrate to Responses API first?
e.g. I see that the docs @humphd shared above also refer to Responses API.
Just trying to get a better sense of the goals here :)
@Amnish04 you're asking good questions, and running into the limits of my knowledge of "reasoning," which I haven't used.
I'm ambivalent about the Responses API, but do note that OpenRouter has support for it.
I, personally, don't want to get into supporting one-off things for a particular model, since that ends up making the code a mess. Our current image and tts stuff is like this now.
As an aside, the Vercel AI components have support for this: https://ai-sdk.dev/elements/components/reasoning. If we ever switch to the AI SDK (which would be smart, I think), this would be another thing we could leverage.
I think there are starting to be two competing reasoning standards.
But right now, basic reasoning support ala https://openrouter.ai/docs/use-cases/reasoning-tokens will give useful parallels to most models my students use (anthropic and gemini).
Going beyond this for specific providers is likely way out of scope.
However there is the inline stuff using Anthropic's api (with them and the Chinese company going that direction) and then there is whatever openai is going with.
Ok did some more reading and seems like chat completions api does not return reasoning tokens/summaries in responses.
Sample chat completion with a reasoning model
{ "id": "chatcmpl-CZ7XYzx58pkJjk9JAlCaXcIQWbwz2", "object": "chat.completion", "created": 1762486500, "model": "o1-2024-12-17", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The Chat Completions API does not include its internal “chain-of-thought” or reasoning\ntokens in its output by default. It provides a natural language response (the answer\nportion you see), but it does not share the hidden reasoning steps used to generate\nthat response. You can, of course, prompt the model to provide summaries or step-by-step\nsolutions in the text of its reply, but there is no separate “reasoning token” stream\navailable from the API.", "refusal": null, "annotations": [] }, "logprobs": null, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 221, "completion_tokens": 1200, "total_tokens": 1421, "prompt_tokens_details": { "cached_tokens": 0, "audio_tokens": 0 }, "completion_tokens_details": { "reasoning_tokens": 1088, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0 } }, "service_tier": "default", "system_fingerprint": "fp_f05e0cf0c6" }But I also see that the conversation started specifically with DeepSeek reasoning models supporting this feature via chat completions interface. Does that mean we are aiming to support this feature just for DeepSeek models? If so, doesn't it go against ChatCraft's philosophy of broad provider support?
And if we are aiming to add support for most reasoning models, which would be through standard OpenAI interface, wouldn't we have to migrate to Responses API first?
e.g. I see that the docs @humphd shared above also refer to Responses API.
Just trying to get a better sense of the goals here :)
There are laws/guidance against the use of deepseek here in Australia. I would say openrouter.ai only as a broad cross-platform coalition.
Otherwise it is openai, anthropic, ollama, and gemini hooks on top.
I think supporting https://openrouter.ai/docs/use-cases/reasoning-tokens makes the most sense. I'd love to switch to only supporting OpenRouter tbh, but that's beyond this.