[Bug]: o4-mini and o3-mini display no thoughts
Is there an existing issue for the same bug? (If one exists, thumbs up or comment on the issue instead).
- [x] I have checked the existing issues.
Describe the bug and reproduction steps
When using OpenHands with o4-mini or o3-mini, they display no thoughts in the frontend.
This is confusing to users, who can not tell why the agent did what it did.
Thanks @kentyman23 for pointing this out.
OpenHands Installation
Docker command in README
OpenHands Version
No response
Operating System
None
Logs, Errors, Screenshots, and Additional Context
No response
NOTABUG: OpenAI Responses API (unlike the ChatGPT Web App API) does not expose chains of thoughts in the returned response.
I know the openAI API does not reveal the internal thoughts of the model, but from a user experience perspective we want to have an explanation of what the model is doing so the users can follow along. We need to find a way to fix this with o4 mini
The only way would be to use ChatGPT Web App API, but not even that shows raw CoT (OpenAI o-series models aren't trained to produce human-readable chain of thought responses, the reasoning trace shown in the Web UI is generated by a separate language model translating from ChatGPT gibberish into your language)
I'm confused. In Think->Act->Observe, aren't we supposed to see what it observed (but not the reasoning trace on how it came up with that observation)? As is, these models are basically unusable because they go off the rails without any idea why.
I think the issue is that o3 and o4-mini have internal thoughts that they don't show to users.
I think the issue is that o3 and o4-mini have internal thoughts that they don't show to users.
Yes, but I guess I'm surprised there aren't external observations to show the users. Aren't those missing, too?
@kentyman23 those are not exposed in the responses API, they are only exposed in ChatGPT web and internal APIs
If nothing else, maybe the interface should recognize that you're using such a model and give some sort of explanation of expectations. Otherwise, I feel like other might think things are broken.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
NOTABUG
GPT-OSS-120B has a human-readable chain of thought, but performs worse than the hosted version due to the post training required to make the chain of thought readable and policy-following.
This issue is stale because it has been open for 40 days with no activity. Remove the stale label or leave a comment, otherwise it will be closed in 10 days.
I still think OpenHands can make this clearer to the user.
This issue is stale because it has been open for 40 days with no activity. Remove the stale label or leave a comment, otherwise it will be closed in 10 days.
This issue was automatically closed due to 50 days of inactivity. We do this to help keep the issues somewhat manageable and focus on active issues.