Feature: Optional Handit.ai integration to auto-fix prompts after Ragas evaluations

Open ccgomezn opened this issue 4 months ago • 1 comments

Feature: Optional Handit.ai integration to auto-fix prompts after Ragas evaluations

Hi team 👋

Love what you’ve built with Ragas — it’s become a go-to tool for evaluating LLM apps.

We’ve been thinking about an optional feature that could make Ragas evaluations even more actionable: integrating Handit.ai, our open-source “autonomous engineer” that monitors and fixes AI 24/7.

Problem / Opportunity

Ragas gives developers great evaluation metrics, but acting on failed/low-scoring results often means a manual process.
This slows down iteration and keeps improvements separate from the evaluation workflow.

Proposed Solution

Add an optional parameter (e.g., handit_enabled=True) that:

Sends low-scoring/failed evaluation samples + context to Handit.
Handit automatically suggests or applies prompt/agent improvements.
Users can review or auto-apply fixes.

With one extra flag, users could go from just evaluating → to evaluating and fixing automatically.

Benefits

Speeds up the evaluation → improvement cycle.
No extra setup for Ragas users.
100% opt-in.
Integration maintained by the Handit team (no added load for Ragas maintainers).

Next Steps

If the team is open to it, we can prepare a PR adding this as an optional enhancement.

What do you think?

— Cristhian @ Handit.ai

Aug 11 '25 12:08 ccgomezn

Sounds interesting integration @ccgomezn

Please check out the newer metrics collections approach using instructor via llm_factory and feel free to open a PR for this integration following that :) Would be happy to merge it.

Nov 04 '25 11:11 anistark