haystack-core-integrations icon indicating copy to clipboard operation
haystack-core-integrations copied to clipboard

feat: Add reasoning support for Cohere chat generator

Open ChinmayBansal opened this issue 3 months ago • 5 comments

Related Issues

  • fixes #2179
  • fixes parent issue in haystack repo: https://github.com/deepset-ai/haystack/issues/9700

Proposed Changes:

This PR implements reasoning support for Cohere's chat generator following the parent issue #9700 requirements. The implementation adds the ability to extract and store model reasoning content in the ReasoningContent field of ChatMessage.

Key Features Added:

  • Reasoning extraction function (_extract_reasoning_from_response) that supports multiple formats:
    • XML-style tags: <thinking>, <reasoning>
    • Section headers: ## Reasoning, ## Thinking, ## My reasoning
    • Natural language patterns: "Let me think", "Step by step", "First,"
    • Intelligent separator detection to determine reasoning boundaries
  • Dual-path response parsing that handles both regular responses and tool call scenarios
  • Streaming support with custom _convert_streaming_chunks_to_chat_message_with_reasoning function
  • Tool call compatibility ensuring reasoning works alongside tool calls without conflicts
  • Content cleaning that removes reasoning markers from the main response text
  • Length filtering with 30-character minimum threshold to avoid false positives

The implementation follows the established patterns from Ollama (#2200) and Google GenAI (#2212) PRs, ensuring consistency across providers.

How did you test it?

  • Unit tests: Added comprehensive test suite (TestReasoningExtraction) covering:
    • Various reasoning pattern formats (<thinking>, <reasoning>, step-by-step)
    • Edge cases (no reasoning, short reasoning ignored)
    • Natural language reasoning detection
  • Integration tests: Added TestCohereChatGeneratorReasoning class with:
    • End-to-end reasoning extraction with Command A Reasoning model (lightweight)
    • Mock response testing for consistent behavior
    • Tool call compatibility verification
  • Manual verification: All unit tests pass, integration tests show expected rate limiting (API working correctly)
  • Code quality: All linting (hatch run fmt) and type checking (hatch run test:types) pass

Notes for the reviewer

  • The reasoning extraction logic is in _extract_reasoning_from_response() function (lines ~355-435)
  • Key integration points are in _parse_response() for both regular and tool call paths (lines ~194-220)
  • Streaming support uses a post-processing approach since Cohere embeds reasoning in response text
  • Test cases use the lightweight command-a-reasoning-111b-2024-10-03 model as recommended
  • Implementation handles the Cohere-specific thinking parameter via generation_kwargs={"thinking": True}

Checklist

  • [x] I have read the contributors guidelines and the code of conduct
  • [x] I have updated the related issue with new insights and changes
  • [x] I added unit tests and updated the docstrings
  • [x] I've used one of the conventional commit types for my PR title: fix:,feat:, build:, chore:, ci:, docs:,style:, refactor:, perf:,test:.

ChinmayBansal avatar Sep 12 '25 06:09 ChinmayBansal

@sjrl I have addressed your feedback. The changes I have made are using the native method while keeping the fallback method if the native method does not apply reasoning content. I have also updated the test cases.

Let me know if more changes are needed!

ChinmayBansal avatar Nov 14 '25 22:11 ChinmayBansal

Hey @ChinmayBansal I noticed in your PR description you also mention adding support for reasoning content when streaming as well, but I don't see any changes in the code to reflect this. Is this something you could also add?

sjrl avatar Nov 18 '25 08:11 sjrl

@sjrl I had included it in the text-based fallback extraction thinking it was needed for streaming. After doing some more research, I found that streaming API does support thinking with chunk.delta.message.content.thinking in content delta events similar to the non streaming structure. I did remove the fallback logic. The current implementation now uses only the native API for non streaming responses. For streaming, it would have to come through content delta events but Haystack's StreamingChunk does not currently have a reasoning field to handle this. This could be added as a followup if needed.

ChinmayBansal avatar Nov 19 '25 05:11 ChinmayBansal

For streaming, it would have to come through content delta events but Haystack's StreamingChunk does not currently have a reasoning field to handle this. This could be added as a followup if needed.

StreamingChunk does have a field for this called reasoning. The docs are out of date on this but if your check our API docs here https://docs.haystack.deepset.ai/reference/data-classes-api#streamingchunk you can see that StreamingChunk has a reasoning field.

I'd recommend checking out how it's used in the OpenAIRresponsesChatGenerator here https://github.com/deepset-ai/haystack/blob/0100a4849abbe1cbaa6d2b1cdbb3e1d2a471c3ff/haystack/components/generators/chat/openai_responses.py#L579

sjrl avatar Nov 19 '25 07:11 sjrl

@sjrl I have added streaming support for reasoning content. I handled the content-deltas now and created reasoning content objects for thinking deltas. Since the StreamingChunk can have one of content/tool_calls/reasoning set at a time, I prioritized thinking content over text when both are there.

I also added test cases and followed the OpenAI implementation.

ChinmayBansal avatar Nov 19 '25 07:11 ChinmayBansal