genkit icon indicating copy to clipboard operation
genkit copied to clipboard

refactor(plugins/compat-oai): use ChatCompletionAccumulator for strea…

Open eric642 opened this issue 1 month ago • 3 comments

  • Simplified generateStream by using openai-go's ChatCompletionAccumulator
  • Removed manual tool call accumulation logic (currentToolCall, toolCallCollects)
  • Created convertChatCompletionToModelResponse helper for unified response conversion
  • Added support for detailed token usage fields:
    • ThoughtsTokens (reasoning tokens)
    • CachedContentTokens (cached tokens)
    • Audio, prediction tokens in custom field
  • Added support for refusal messages and system fingerprint metadata
  • Refactored generateComplete to reuse convertChatCompletionToModelResponse

Description here... Help the reviewer by:

  • linking to an issue that includes more details
  • if it's a new feature include samples of how to use the new feature
  • (optional if issue link is provided) if you fixed a bug include basic bug details

Checklist (if applicable):

  • [ ] PR title is following https://www.conventionalcommits.org/en/v1.0.0/
  • [ ] Tested (manually, unit tested, etc.)
  • [ ] Docs updated (updated docs or a docs bug required)

eric642 avatar Nov 19 '25 09:11 eric642

@hugoaguirre @apascal07 pls review it

eric642 avatar Nov 20 '25 03:11 eric642

Hi Eric,

Thank you for this PR. I have given it a quick look over and will do a more thorough review once I investigate how, if at all, the changes to the tools API will interact with our in-progress implementation for multi-part tool responses as you are eager to see supported. We'd like to roll multiple changes to the API into one release. Thanks for your patience!

apascal07 avatar Nov 25 '25 18:11 apascal07

@apascal07 OK. Thank you very much for your patience in handling this.

eric642 avatar Nov 26 '25 01:11 eric642

Some updates: your idea to support parallel and sequential tools is a good one but we want to take it a step further and not assume that all parallel tools must go first (or vice versa), but rather do stages of all parallel, all sequential, all parallel, etc based on what tool requests the model returns and what the tools are marked with (parallel or not). We're discussing the details of the design now and will implement shortly after.

apascal07 avatar Dec 01 '25 16:12 apascal07

Ok, I'll close the pr and reopen a pr submitted: refactor (plugins/compat - oai) : use ChatCompletionAccumulator for streaming

eric642 avatar Dec 02 '25 01:12 eric642