Codex model response latency increases significantly as conversation grows

Open Skyline-23 opened this issue 9 hours ago • 1 comments

Problem

When using Codex models (via OAuth authentication), response latency increases significantly as the conversation grows longer. This happens because the entire conversation history is sent with every API request.

Root Cause

After investigating the codebase:

The OpenAI Responses API returns a responseId in providerMetadata.openai.responseId
The SDK supports previousResponseId option (openai-responses-language-model.ts:285)
However, this feature is never actually used - the responseId from responses is not stored, and previousResponseId is never set

Current Flow

Request 1 → Send full history → Get response with responseId (discarded)
Request 2 → Send full history again → Get response with responseId (discarded)
Request N → Send increasingly large history → Slow response

Expected Flow with previous_response_id

Request 1 → Send full history → Get response with responseId (saved)
Request 2 → Send previousResponseId + new message only → Fast response
Request N → Reference previous response → Consistent speed

Impact

Codex API calls become progressively slower as conversations grow
Poor user experience with long coding sessions
Unnecessary bandwidth and compute usage

Proposed Solution

Add responseId field to AssistantMessage schema
Store responseId from providerMetadata when receiving responses
Pass previousResponseId to subsequent requests via providerOptions

Jan 17 '26 07:01 Skyline-23