feat: use previous_response_id for Codex API optimization

Open Skyline-23 opened this issue 15 hours ago • 1 comments

Summary

Implements previous_response_id support to significantly reduce response latency when conversations grow long with Codex models.

Fixes #9045

Changes

Add responseId field to AssistantMessage schema to persist response IDs
Store responseId from providerMetadata.openai.responseId in processor.ts
Add previousResponseId to StreamInput type in llm.ts
Pass previousResponseId through providerOptions in transform.ts
Extract last assistant's responseId in prompt.ts and compaction.ts

How it works

Before: Every request sends the entire conversation history, causing O(n) or worse latency growth.

After: Subsequent requests reference the previous response via previous_response_id, allowing the API to skip re-processing the conversation history.

Request 1 → Full history → responseId saved
Request 2 → previousResponseId + new content → Fast response

Testing

Build passes successfully
The feature only activates for OpenAI providers that return responseId in metadata

Jan 17 '26 07:01 Skyline-23