crewAI icon indicating copy to clipboard operation
crewAI copied to clipboard

Feature: Implement recursive criticism and iteration process in Task model

Open chandrakanth137 opened this issue 10 months ago • 6 comments

Documentation for Recursive Criticism and Iteration (RCI) Methods

Overview

Recursive Criticism and Iteration (RCI) is a systematic process used to iteratively refine AI-generated outputs to enhance their accuracy and alignment with expectations. This process involves three key steps: Critique, Rectify, and Iterate. Each step ensures that the output is logically sound, relevant, and free of factual inconsistencies.

This documentation outlines the implementation of RCI in the CrewAI library, leveraging LangChain-based agents. The implementation consists of three primary methods: critique, rectify, and critique_and_iterate. These methods work in sequence to evaluate, refine, and finalize AI-generated outputs.

Methods

1. critique

This method analyzes an AI-generated output and provides constructive critiques to improve alignment with the given task description and expected output. The critique process strictly avoids minor stylistic or paraphrasing changes and focuses solely on substantive discrepancies.

Parameters:

  • agent (BaseAgent): The agent responsible for processing the task.
  • current_output (str): The output currently under evaluation.
  • description (str): A detailed description of the task.
  • expected_output (str): The anticipated outcome of the task.
  • context (Optional[str]): Additional contextual information relevant to the task.
  • tools (Optional[List[BaseTool]]): A set of tools the agent may utilize during critique.

Returns:

  • critique_response (str): A critique detailing any inconsistencies, errors, or missing elements in the output. If the output is deemed satisfactory, the response will be exactly: NO ISSUES FOUND.

2. rectify

This method refines the AI-generated output based on critiques provided by the critique method. It modifies the output only to resolve identified issues, preserving all other aspects of the original response.

Parameters:

  • agent (BaseAgent): The agent responsible for processing the rectification task.
  • critique_response (str): The critique feedback from the critique method.
  • current_output (str): The output to be refined.
  • description (str): A description of the task requirements.
  • expected_output (str): The expected final output.
  • context (Optional[str]): Additional relevant task context.
  • tools (Optional[List[BaseTool]]): Tools available for use in rectification.

Returns:

  • rectified_output (str): A revised version of the output addressing the critiques.

3. critique_and_iterate

This method manages the full recursive process of critique and rectification until the output meets expectations or reaches a predefined maximum iteration count (rci_max_count).

Parameters:

  • agent (BaseAgent): The agent responsible for iterative refinement.
  • initial_output (str): The first AI-generated output.
  • description (str): A description of the task.
  • expected_output (str): The anticipated outcome of the task.
  • context (Optional[str]): Contextual information for task execution.
  • tools (Optional[List[BaseTool]]): Tools available for critique and refinement.

Returns:

  • final_output (str): The refined output after iterative processing.

The method performs multiple iterations of critique and rectification up to rci_max_count. If the critique response is NO ISSUES FOUND, the iteration halts, and the current output is returned as the final output.

Task Class Enhancements

The Task class has been updated to support the RCI process. Key attributes include:

  • rci (bool): Enables or disables the RCI process.
  • rci_max_count (int): Sets the maximum iterations for RCI.
  • description & expected_output: Provide detailed task definitions for accurate critique.
  • agent & tools: Ensure the execution environment has the necessary resources.

Summary

The RCI framework (Critique, Rectify, and Iterate) forms a structured method to refine AI-generated outputs efficiently. By continuously evaluating and improving responses, the system ensures alignment with task expectations while reducing errors. This method enhances AI reliability in complex tasks and structured workflows.

chandrakanth137 avatar Feb 26 '25 15:02 chandrakanth137

The tests cases will fail for few cases because there are some assertations that count the number of the _execute() function is being called and using RCI will increase the count.

chandrakanth137 avatar Feb 26 '25 16:02 chandrakanth137

Disclaimer: This review was made by a crew of AI Agents.

Code Review Comment for PR #2240: Recursive Criticism and Iteration Process

Overview

This pull request introduces a new Recursive Criticism and Iteration (RCI) feature in the Task model, enabling automated quality checks and iterative improvements to task outputs. Below are my findings and suggestions based on the implementation.

Detailed Analysis

1. New Field Additions

The addition of rci and rci_max_count fields is a positive enhancement, allowing for configurable iterations of the RCI process.

Suggested Improvement:

  • Input Validation: Add validations to rci_max_count to ensure no negative numbers are accepted. Additionally, consider imposing a maximum limit on iterations to prevent excessive use.
    rci_max_count: int = Field(
        default=1,
        ge=1,  # Ensure it's at least 1
        le=5,  # Limit to a maximum of 5 iterations
        description="Number of iterations to run the RCI process (1-5).",
    )
    

2. Critique and Iterate Implementation

The implementation in critique_and_iterate effectively integrates the recursive process. However, a few improvements can enhance safety and clarity:

Issues Identified:

  • Missing return type hints in method signatures.
  • Inadequate error handling which could lead to unhandled exceptions during runtime.
  • Potential side effects caused by mutating self.description and self.expected_output.

Suggested Improvements:

  • Add Type Hints and Error Handling:
    def critique_and_iterate(
        self, 
        agent: BaseAgent, 
        initial_output: str, 
        description: str, 
        expected_output: str, 
        context: Optional[str], 
        tools: Optional[List[BaseTool]]
    ) -> str:  # Specify return type
    
  • State Management: Ensure original values are restored correctly after usage to prevent side effects.

3. Critique Method

The critique method generates feedback but has room for enhancements:

Issues Identified:

  • Hardcoded prompt templates may reduce flexibility.
  • There is no format validation for the critique responses.
  • Missing log statements can hinder debugging efforts.

Suggested Improvement:

  • Incorporate Logging:
    logger.debug(f"Starting critique for task: {self.name}")
    

4. General Recommendations

  1. Testing: Ensure unit tests for the RCI process are added to verify functionality across various scenarios, including edge cases.
  2. Documentation Enhancements: Expand docstrings with usage examples and document possible failure modes, clarifying the configuration guidelines for rci_max_count.
  3. Performance Optimizations: Consider implementing caching for repeated critiques and develop a timeout mechanism to manage long-running iterations.
  4. Monitoring and Logging: Introduce logging for success rates and failure metrics to facilitate tracking of improvement percentages over time.

Conclusion

Overall, the RCI feature provides a structured approach to improving task outputs through iteration. However, the implementation requires further refinements to improve robustness, maintainability, and overall performance in production scenarios.

Emphasizing these improvements will not only solidify the feature's reliability but also enhance the user experience and developer maintainability in the long run. Thank you for your efforts on this significant enhancement!

joaomdmoura avatar Feb 26 '25 16:02 joaomdmoura

The chat bot reviewed changes have been made in the recent commits

chandrakanth137 avatar Mar 03 '25 11:03 chandrakanth137

Hey @chandrakanth137!

Could you please share a demo of this feature running and show how the output is improved by using this feature?

Also, it looks like there is a conflict in the task.py

bhancockio avatar Mar 20 '25 13:03 bhancockio

@bhancockio The conflict has been resolved. Regarding the test cases, RCI has use cases when using smaller LLMs, especially the ones which struggles to reason. Is there any specific kind of test cases that you would want me to demo ?

For example: Recent LLMs can solve the question, How many r's in the word "Strawberry" ?, but small LLMs like Llama3:4b and Llama3.2:1b, Llama3.2:3b struggle to answer this question correctly, but using RCI it was able to answer the question correctly.

chandrakanth137 avatar Mar 22 '25 07:03 chandrakanth137

This PR is stale because it has been open for 45 days with no activity.

github-actions[bot] avatar May 06 '25 12:05 github-actions[bot]

@chandrakanth137 Would you mind to remove no-related commits from this PR?

lucasgomide avatar May 14 '25 12:05 lucasgomide

@lucasgomide Okay, is it okay if I post it as a new PR instead of editing this PR ?

chandrakanth137 avatar May 14 '25 13:05 chandrakanth137

It's ok

lucasgomide avatar May 14 '25 15:05 lucasgomide