crewAI icon indicating copy to clipboard operation
crewAI copied to clipboard

Added Advance Configuration Docs for Rag Tool

Open Vidit-Ostwal opened this issue 7 months ago • 5 comments

Added Additional docs for the RAG tool advance configuration fixes #2671

Vidit-Ostwal avatar Apr 29 '25 07:04 Vidit-Ostwal

Disclaimer: This review was made by a crew of AI Agents.

Thank you for the contribution to enhance the RAG Tool documentation with an advanced configuration section. This update substantially improves the usability for advanced users by providing a realistic example configuration covering embedding, vector database (Elasticsearch), and chunking parameters. Including the note about the Embedchain adapter and linking to its documentation is a valuable addition that helps users understand the integration points.

Here are some detailed findings and specific suggestions for improvement to elevate the quality and clarity of this documentation update:

Key Findings

  • The new "Advance Configuration" section is well-structured and presents the configuration example clearly, making it easier for users to customize the RagTool.
  • The code snippet uses consistent indentation and formatting, making it readable.
  • Linking to the Embedchain documentation improves discoverability and provides a good external reference.

Suggestions for Improvement

  1. Section Heading Typo

    • Change the heading from:
      ## Advance Configuration
      
      to the grammatically correct:
      ## Advanced Configuration
      

    This small fix avoids any perception of haste and improves professionalism.

  2. Sensitive Data Management

    • The example contains "api_key": "your-key", which could unintentionally encourage users to hardcode secrets.
    • Add an inline comment to the example and a clear note just below the code snippet emphasizing secure secrets handling.

    Example inline comment:

    "api_key": "your-key",  # Use environment variables or a secret manager for sensitive keys
    

    And a note below the snippet:

    Note: Never hardcode secrets or API keys in configuration files. Use environment variables or secret management services.

  3. Boolean Values Formatting

    • In the example, verify_certs is set to False — this follows Python style, but since this config is shown in a JSON-like style within MDX/Markdown, it's best to use JSON boolean lowercase for clarity:
    "verify_certs": false
    
  4. Enhance Context on Embedchain

    • The current sentence:

      The internal RAG tool utilizes the Embedchain adapter...

    Could be improved by briefly describing Embedchain’s role:

    The internal RAG tool utilizes the [Embedchain](https://docs.embedchain.ai/components/introduction) adapter, an extensible framework for retrieval-augmented generation, allowing you to pass any supported configuration options.
    

    This helps users unfamiliar with Embedchain understand its purpose quickly.

  5. Clarify .yaml File Reference

    • The statement:

      Make sure to review the configuration options available in the .yaml file.

    Could confuse users about which YAML file is referenced.

    Recommendation:

    Make sure to review the configuration options available in your project's Embedchain `.yaml` configuration file (see the Embedchain documentation for details).
    

    This clarifies that it’s a project-specific configuration file.

Example of Suggested Revised Snippet

## Advanced Configuration

You can use advanced configuration for the RAG Tool by passing supported options to the `RagTool` class. Here’s an example:

```python
config = {
    "embedding": {
        "provider": "openai",
        "config": {
            "model": "text-embedding-ada-002"
        }
    },
    "vectordb": {
        "provider": "elasticsearch",
        "config": {
            "collection_name": "my-collection",
            "cloud_id": "deployment-name:xxxx",
            "api_key": "your-key",  # Use environment variables or a secret manager for sensitive keys
            "verify_certs": false
        }
    },
    "chunker": {
        "chunk_size": 400,
        "chunk_overlap": 100,
        "length_function": "len",
        "min_chunk_size": 0
    }
}

rag_tool = RagTool(config=config, summarize=True)

Note: Never hardcode secrets or API keys in configuration files. Use environment variables or secret management services.

The internal RAG tool utilizes the Embedchain adapter, an extensible framework for retrieval-augmented generation, allowing you to pass any configuration options that are supported.
Make sure to review the configuration options available in your project's Embedchain .yaml configuration file (see the Embedchain documentation for details).

Conclusion


### Summary Table of Issues & Recommendations

| Issue                        | Recommendation                                             |
|------------------------------|------------------------------------------------------------|
| Typo in section heading       | Change "Advance" → "Advanced"                              |
| Hardcoded API key example     | Add inline comment and note about secrets management      |
| Boolean capitalization       | Use lowercase `false` instead of `False`                   |
| Embedchain context missing    | Add brief description about Embedchain                     |
| Ambiguous `.yaml` file ref    | Specify reference to project’s Embedchain config file      |

---

### Overall Assessment

This PR adds substantial value to the documentation, aiding advanced users in customizing the RAG Tool. The inclusion of an example config with embedding, vector DB, and chunking parameters is well done. Linking to the external Embedchain docs helps maintain a modular documentation approach.

With the recommended minor corrections—mostly editorial and security best practices—this contribution will be well polished, professional, and educational.

Thank you for your efforts on improving the crewAI project documentation! Looking forward to the updated version with these suggested enhancements.

mplachta avatar Apr 29 '25 07:04 mplachta

@lucasgomide, Any idea on this one, the test case is not failing, I think the run time is taking a long time >15 mins. Not sure why?

Vidit-Ostwal avatar Apr 29 '25 13:04 Vidit-Ostwal

There are several failling tests

tests/crew_test.py::test_crew_with_delegating_agents FAILED              [ 35%]
tests/crew_test.py::test_crew_with_delegating_agents_should_not_override_task_tools PASSED [ 35%]
tests/crew_test.py::test_crew_with_delegating_agents_should_not_override_agent_tools PASSED [ 35%]
tests/crew_test.py::test_task_tools_override_agent_tools FAILED          [ 35%]
tests/crew_test.py::test_task_tools_override_agent_tools_with_allow_delegation PASSED [ 35%]
tests/crew_test.py::test_crew_verbose_output FAILED                      [ 35%]
tests/crew_test.py::test_cache_hitting_between_agents FAILED             [ 35%]
tests/crew_test.py::test_api_calls_throttling FAILED                     [ 35%]
tests/crew_test.py::test_crew_kickoff_usage_metrics FAILED    

lucasgomide avatar Apr 29 '25 14:04 lucasgomide

There are several failling tests

tests/crew_test.py::test_crew_with_delegating_agents FAILED              [ 35%]
tests/crew_test.py::test_crew_with_delegating_agents_should_not_override_task_tools PASSED [ 35%]
tests/crew_test.py::test_crew_with_delegating_agents_should_not_override_agent_tools PASSED [ 35%]
tests/crew_test.py::test_task_tools_override_agent_tools FAILED          [ 35%]
tests/crew_test.py::test_task_tools_override_agent_tools_with_allow_delegation PASSED [ 35%]
tests/crew_test.py::test_crew_verbose_output FAILED                      [ 35%]
tests/crew_test.py::test_cache_hitting_between_agents FAILED             [ 35%]
tests/crew_test.py::test_api_calls_throttling FAILED                     [ 35%]
tests/crew_test.py::test_crew_kickoff_usage_metrics FAILED    

Oh, will check those out.

Vidit-Ostwal avatar Apr 29 '25 15:04 Vidit-Ostwal

@Vidit-Ostwal I just noticed something weird with our test suite I’m looking into it right now

lucasgomide avatar Apr 29 '25 18:04 lucasgomide