dify icon indicating copy to clipboard operation
dify copied to clipboard

Agent Should Output Planning Text Before Tool Execution

Open CrankGentleman opened this issue 7 months ago • 2 comments

Self Checks

  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [x] Please do not modify this template :) and fill in all the required fields.

Dify version

1.4.0

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

  1. Create an agent with multiple tool capabilities
  2. Input a complex query that requires multiple tool calls
  3. Observe the current behavior where the agent:
    • Directly proceeds to tool execution
    • Only shows output after tool execution completes
    • No intermediate planning or thinking process is visible

Current Behavior

Currently, when an agent receives a query:

  1. It immediately starts executing tools without showing its planning process
  2. Users have to wait without any feedback until tool execution completes
  3. For time-consuming operations, it appears as if the system is frozen
  4. No visibility into the agent's decision-making process

Expected Behavior

The agent should follow a similar pattern to Cursor's AI agent:

  1. Immediate Initial Response:

    • First analyze and output the overall plan
    • Explain what steps it will take
    • Show its reasoning process
  2. Step-by-Step Execution with Visibility:

    • "I'm going to search the codebase for relevant files..."
    • "Now I'll analyze the file contents..."
    • "I'll need to make the following API calls..."
  3. Progressive Output:

    • Show results after each step
    • Keep user informed of progress
    • Make the execution process transparent

Reference Implementation

Cursor's AI agent provides an excellent example of this behavior:

  1. It immediately responds with a plan
  2. Shows thinking process before each tool call
  3. Maintains continuous communication with the user
  4. Makes the execution process feel interactive rather than blocking

This creates a much more engaging and transparent experience, where users can follow the agent's thought process and execution steps, similar to pair programming with a human developer.

Why This Matters

  1. Better User Experience:
    • Users understand what the agent is doing
    • Reduces uncertainty during long operations
    • Provides transparency in the decision-making process
  2. More Interactive:
    • Users can see the agent's thought process
    • Helps in debugging and improving agent behavior
  3. Similar to human thinking:
    • Plans first, then executes
    • Makes the agent behavior more predictable and trustworthy

Additional Context

This enhancement would significantly improve the agent's usability, especially for complex queries requiring multiple tool calls or time-consuming operations. The planning output would serve as a "thinking aloud" process, making the agent's behavior more transparent and user-friendly.

2. Additional context or comments

No response

CrankGentleman avatar May 22 '25 03:05 CrankGentleman

👍 +1 to this request — this is an extremely valuable enhancement.

The current behavior (tool calls happening silently, with no visible reasoning) works functionally but lacks the transparency and responsiveness that users now expect from intelligent agents.

Adding a progressive output mechanism — where the agent shares its plan, intermediate thoughts, and tool call intentions — would:

Improve user trust by making decisions visible

Make long-running tasks feel interactive rather than blocking

Help users understand and debug agent behavior more effectively

Align with how tools like Cursor or LangChain's agents behave, offering a more modern and human-like UX

This isn’t just a UX polish — it’s a core usability feature that turns passive responses into engaged, real-time AI collaboration.

Would love to see this prioritized — it would make Dify agents feel much more alive and intelligent. Thanks for the great work so far!

This would also help Dify stand out more in the open-source LLM ecosystem, especially among developers looking for high-interaction agent frameworks.

jhrcc avatar May 30 '25 07:05 jhrcc

Yes, let's look forward to this moment

CrankGentleman avatar May 30 '25 07:05 CrankGentleman

Hi, @CrankGentleman. I'm Dosu, and I'm helping the Dify team manage their backlog and am marking this issue as stale.

Issue Summary:

  • You requested that the agent progressively output its planning and reasoning steps during multi-step queries to improve transparency and interactivity.
  • This feature is seen as a key usability improvement and has strong support from maintainers for benefits like increased user trust and better debugging.
  • The issue remains unresolved with no recent updates or implementation progress.
  • The discussion highlights this as an important enhancement aligned with modern agent frameworks.

Next Steps:

  • Please let me know if this feature is still relevant to your use case with the latest version of Dify by commenting on this issue.
  • If I don’t hear back within 15 days, I will automatically close this issue to keep the backlog manageable.

Thank you for your understanding and contribution!

dosubot[bot] avatar Aug 28 '25 16:08 dosubot[bot]