chatcraft.org icon indicating copy to clipboard operation
chatcraft.org copied to clipboard

include reasoning tokens in ui

Open tarasglek opened this issue 10 months ago • 18 comments

deepseek returns https://api-docs.deepseek.com/guides/reasoning_model reasoning tokens. We should use html details/summary feature for this

openrouter is gonna support this for all reasoning models

this is gonna be interesting for also explicitly including reasoning context when switching models to do function calls, etc that reasoning models suck at

tarasglek avatar Jan 26 '25 09:01 tarasglek

Could you elaborate on this one, I would like to learn the entire idea 💡

mulla028 avatar Jan 26 '25 18:01 mulla028

We can render messages using https://developer.mozilla.org/en-US/docs/Web/HTML/Element/details

<details>
|reasoning_content from r1 model(we dont parse these out atm)|
<summary>
content from r1(same as what we see now)
</summary>
</details

Eg

|reasoning_content from r1 model(we dont parse these out atm)| content from r1(same as what we see now)

tarasglek avatar Jan 26 '25 18:01 tarasglek

@tarasglek sounds interesting, may I try?

mulla028 avatar Jan 28 '25 19:01 mulla028

Just now found time to try new DeepSeek, their deep thinking option is impressive, way better than OpenAI's o1 model has.

mulla028 avatar Jan 30 '25 21:01 mulla028

Here's my proposal:

  • Add a checkbox or something like this, therefore, user could see reasoning tokens
  • If checkbox unchecked, user receives regular fast response without deep thinking

Questions: I assume that isn't support by every model, how may I release it since user may choose any desirable model?

mulla028 avatar Jan 30 '25 21:01 mulla028

I would put any UI for selecting/de-selecting this into the Preferences Modal vs. adding to the prompt area, which is already too busy.

humphd avatar Jan 31 '25 02:01 humphd

I think we should just add a thinking feature to our data model. Eg add a .reasoning_content like they do to our messages and when that's present show that in ui during streaming and collapse it afterward.

tarasglek avatar Jan 31 '25 07:01 tarasglek

This has been requested again on Discord recently:

Basically, there's no evidence of reasoning happening (or to set reasoning budget/amount/thresholds) so I'm not sure that reasoning is actually occurring. But yeah, basically, to show whenever it reasons in the UI so folks are aware of it. Basically, I don't have confidence that reasoning is happening under the hood, especially sonnet-4-5's threaded stuff

See https://openrouter.ai/docs/api-reference/responses-api/reasoning and we should figure out the right way to pull these reasoning messages out as it streams and show.

humphd avatar Nov 07 '25 00:11 humphd

Some more background from Claude:

Streaming Reasoning Messages in Chat Completions

When streaming chat completions with reasoning models (like o1), the reasoning content is returned through delta chunks in the stream, similar to regular message content.

Stream Structure

// Each chunk in the stream has this structure
{
  id: "chatcmpl-...",
  object: "chat.completion.chunk",
  created: 1234567890,
  model: "o1-preview",
  choices: [{
    index: 0,
    delta: {
      reasoning_content: "Let me think about this...",
      content: "The answer is..."
    },
    finish_reason: null
  }]
}

Key Points

  1. Two separate fields: Reasoning appears in delta.reasoning_content while the final answer appears in delta.content

  2. Incremental delivery: Both fields stream incrementally, token by token, just like regular streaming responses

  3. Finish reason: When complete, you'll see finish_reason: "stop" or finish_reason: "length"

Example: Processing Streamed Reasoning

async function streamWithReasoning(messages) {
  const stream = await openai.chat.completions.create({
    model: "o1-preview",
    messages: messages,
    stream: true
  });

  let reasoning = "";
  let content = "";

  for await (const chunk of stream) {
    const delta = chunk.choices[0]?.delta;
    
    if (delta?.reasoning_content) {
      reasoning += delta.reasoning_content;
      console.log("Reasoning:", delta.reasoning_content);
    }
    
    if (delta?.content) {
      content += delta.content;
      console.log("Content:", delta.content);
    }
  }

  return { reasoning, content };
}

The reasoning tokens are not counted toward your output token usage—only the final content tokens are billed.

humphd avatar Nov 07 '25 00:11 humphd

So perhaps when we render the streaming message, we can have a separate area for the reasoning in the UI. Maybe we do this as @tarasglek suggests with summary and details, or maybe we split the parent node of the message content in the React component such that we can put the reasoning portion into another slot that's collapsed (when fully rendered and complete) or showing (as it streams in).

This needs some thought. First task would be to figure out how to get at the reasoning_content in the streaming chunks, maybe in https://github.com/tarasglek/chatcraft.org/blob/main/src/lib/ai.ts#L38-L59.

After that, update https://github.com/tarasglek/chatcraft.org/blob/main/src/lib/ai.ts#L205 to allow passing that extra content in such a way that it can get passed through to the UI.

A bunch more needs to happen, but this would be a good place to begin research.

humphd avatar Nov 07 '25 00:11 humphd

See also https://platform.openai.com/docs/guides/reasoning

humphd avatar Nov 07 '25 00:11 humphd

Hey folks. Just as a UAT in terms of the interleaved anthropic style reasoning. The first screenshot is a failure. (The study comes from medarxiv https://www.medrxiv.org/content/10.1101/2024.10.07.24314963v1.

Image

Prompt:


Once you have generated your slide, review it and make sure that the text on the slide can fit a standard high level powerpoint slide, suggest edits and then make a second version.```

(This is a bad prompt, but just one designed to make it interleave thinking and doing.)

<img width="1524" height="730" alt="Image" src="https://github.com/user-attachments/assets/6a06c304-b840-4905-b233-3cc68067125b" />is what their platform shows.

https://claude.ai/share/a2f55986-ab8d-4e58-ad76-8b51fb0c07c4 is the same thing on Claude. 

It is the thinking -> doing -> thinking loop that will be useful to show to students.

Denubis avatar Nov 07 '25 00:11 Denubis

The reasoning tokens are not counted toward your output token usage—only the final content tokens are billed.

This bit seems inaccurate based on the official docs.

Image

Edit: Also found this reading a bit further. I guess it's different for chat completions.

Image

Amnish04 avatar Nov 07 '25 02:11 Amnish04

Ok did some more reading and seems like chat completions api does not return reasoning tokens/summaries in responses.

Sample chat completion with a reasoning model

{
    "id": "chatcmpl-CZ7XYzx58pkJjk9JAlCaXcIQWbwz2",
    "object": "chat.completion",
    "created": 1762486500,
    "model": "o1-2024-12-17",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "The Chat Completions API does not include its internal “chain-of-thought” or reasoning\ntokens in its output by default. It provides a natural language response (the answer\nportion you see), but it does not share the hidden reasoning steps used to generate\nthat response. You can, of course, prompt the model to provide summaries or step-by-step\nsolutions in the text of its reply, but there is no separate “reasoning token” stream\navailable from the API.",
                "refusal": null,
                "annotations": []
            },
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 221,
        "completion_tokens": 1200,
        "total_tokens": 1421,
        "prompt_tokens_details": {
            "cached_tokens": 0,
            "audio_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 1088,
            "audio_tokens": 0,
            "accepted_prediction_tokens": 0,
            "rejected_prediction_tokens": 0
        }
    },
    "service_tier": "default",
    "system_fingerprint": "fp_f05e0cf0c6"
}

But I also see that the conversation started specifically with DeepSeek reasoning models supporting this feature via chat completions interface. Does that mean we are aiming to support this feature just for DeepSeek models? If so, doesn't it go against ChatCraft's philosophy of broad provider support?

And if we are aiming to add support for most reasoning models, which would be through standard OpenAI interface, wouldn't we have to migrate to Responses API first?

e.g. I see that the docs @humphd shared above also refer to Responses API.

Just trying to get a better sense of the goals here :)

Amnish04 avatar Nov 07 '25 03:11 Amnish04

@Amnish04 you're asking good questions, and running into the limits of my knowledge of "reasoning," which I haven't used.

I'm ambivalent about the Responses API, but do note that OpenRouter has support for it.

I, personally, don't want to get into supporting one-off things for a particular model, since that ends up making the code a mess. Our current image and tts stuff is like this now.

As an aside, the Vercel AI components have support for this: https://ai-sdk.dev/elements/components/reasoning. If we ever switch to the AI SDK (which would be smart, I think), this would be another thing we could leverage.

humphd avatar Nov 07 '25 14:11 humphd

I think there are starting to be two competing reasoning standards.

But right now, basic reasoning support ala https://openrouter.ai/docs/use-cases/reasoning-tokens will give useful parallels to most models my students use (anthropic and gemini).

Going beyond this for specific providers is likely way out of scope.

However there is the inline stuff using Anthropic's api (with them and the Chinese company going that direction) and then there is whatever openai is going with.

Denubis avatar Nov 07 '25 23:11 Denubis

Ok did some more reading and seems like chat completions api does not return reasoning tokens/summaries in responses.

Sample chat completion with a reasoning model

{
    "id": "chatcmpl-CZ7XYzx58pkJjk9JAlCaXcIQWbwz2",
    "object": "chat.completion",
    "created": 1762486500,
    "model": "o1-2024-12-17",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "The Chat Completions API does not include its internal “chain-of-thought” or reasoning\ntokens in its output by default. It provides a natural language response (the answer\nportion you see), but it does not share the hidden reasoning steps used to generate\nthat response. You can, of course, prompt the model to provide summaries or step-by-step\nsolutions in the text of its reply, but there is no separate “reasoning token” stream\navailable from the API.",
                "refusal": null,
                "annotations": []
            },
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 221,
        "completion_tokens": 1200,
        "total_tokens": 1421,
        "prompt_tokens_details": {
            "cached_tokens": 0,
            "audio_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 1088,
            "audio_tokens": 0,
            "accepted_prediction_tokens": 0,
            "rejected_prediction_tokens": 0
        }
    },
    "service_tier": "default",
    "system_fingerprint": "fp_f05e0cf0c6"
}

But I also see that the conversation started specifically with DeepSeek reasoning models supporting this feature via chat completions interface. Does that mean we are aiming to support this feature just for DeepSeek models? If so, doesn't it go against ChatCraft's philosophy of broad provider support?

And if we are aiming to add support for most reasoning models, which would be through standard OpenAI interface, wouldn't we have to migrate to Responses API first?

e.g. I see that the docs @humphd shared above also refer to Responses API.

Just trying to get a better sense of the goals here :)

There are laws/guidance against the use of deepseek here in Australia. I would say openrouter.ai only as a broad cross-platform coalition.

Otherwise it is openai, anthropic, ollama, and gemini hooks on top.

Denubis avatar Nov 07 '25 23:11 Denubis

I think supporting https://openrouter.ai/docs/use-cases/reasoning-tokens makes the most sense. I'd love to switch to only supporting OpenRouter tbh, but that's beyond this.

humphd avatar Nov 08 '25 14:11 humphd