tinker-cookbook icon indicating copy to clipboard operation
tinker-cookbook copied to clipboard

Feature Request: Streaming Token Generation with Mid-Generation Tool Execution

Open Mr-Ye-Cao opened this issue 3 weeks ago • 1 comments

Summary

Request for streaming token generation that allows pausing generation mid-stream to execute tools and append results before continuing. This would enable proper agentic tool-use patterns where models expect inline tool results.

Problem Statement

Current Behavior

In the current Tinker architecture, model generation is atomic:

# tinker_cookbook/rl/rollouts.py
ac_with_logprobs = await policy(ob, stop_condition)  # Complete generation
step_result = await env.step(ac_with_logprobs.tokens)  # Process AFTER generation

The SamplingClient.sample_async() returns only the final complete token sequence, not intermediate tokens.

The Issue

Models trained with tool-use (e.g., GPT-OSS, function-calling models) expect a specific interaction pattern:

Model: <analysis>Let me check the file</analysis>
Model: <tool_call>{"command": "cat file.txt"}</tool_call>
System: [Tool result appended inline] file contents here...
Model: <analysis>I see the file contains...</analysis>
Model: <tool_call>{"command": "echo 'fixed' > file.txt"}</tool_call>
System: [Tool result appended inline]
Model: <final_answer>Done!</final_answer>

But with atomic generation, we get:

Model: <analysis>Let me check the file</analysis>
Model: <tool_call>{"command": "cat file.txt"}</tool_call>
Model: [HALLUCINATED] The file probably contains...  <-- Model guesses without seeing result
Model: <tool_call>{"command": "echo 'fixed' > file.txt"}</tool_call>
Model: [HALLUCINATED] Command executed successfully
Model: <final_answer>Done!</final_answer>

The model hallucinates tool results because it doesn't receive actual feedback inline.

Mr-Ye-Cao avatar Nov 23 '25 03:11 Mr-Ye-Cao

I may be missing something, but I believe you can achieve this by just adding the tool-call end as a stop condition. You can do

sampling_client.sample(
    sampling_params=tinker.SamplingParams(..., stop=<list of stop strings> or <list of stop tokens>))
)

to set the stop condition to anything.

Or, for the specific code you linked to in the cookbook, you can add the token corresponding to the tool-call end to the list of stop_condition tokens.

erikwijmans avatar Nov 24 '25 01:11 erikwijmans

Closing due to inactivity!

Tiiiger avatar Dec 02 '25 22:12 Tiiiger