ragas
ragas copied to clipboard
[RFC] Executor: making Ragas faster and more reliable
Problem - ragas is slow and unreliable
- Ragas is not exploiting concurrency options provided via
ThreadPoolExecutorandasynciomodules. This is because ragas took a batching approach to evaluation ie evaluated metrics in batches - Not every service has async support - need options to keep sync and no concurrency at all
- need these primitives for #380 and potentially others as well
Core Components
BaseMetric- a metric that evaluates a single score row with butscore()andascore()RagasLLMthat is based onlangchain-corellmsPromptobject with provision for instruction and demonstrations that convert to messages or prompts that is supported by both langchain chat based on completion basedLLMResultobject that supports both chat and text-based outputs
Exectorthat runsBaseMetric. It should also be able to run testset generators so this should be a common paradigm- new
evaluate()function that makes it easier to- change llm and embeddings - this is the new method where
BaseMetrcby default will havellm=Noneand will take the default llm from theevaluate()function. Ifmetric.llm != Nonethen the provided metric is used - switch between async vs threading
- supports callbacks throughout
- change llm and embeddings - this is the new method where
Base classes
Metric
class Metric:
def score(
row, # just 1 row
callbacks: t.Optional[Callbacks] = None,
)-> float:
...
async def ascore(
row, # just 1 row
callbacks: t.Optional[Callbacks] = None,
)-> float:
...
evaluation()
def evaluate(
dataset: Dataset,
metrics: list[Metric] | None = None,
llm: t.Optional[BaseRagasLLM] = None,
embeddings: t.Optional[RagasEmbeddings] = None,
callbacks: Callbacks = [],
is_async: bool = True,
max_workers: t.Optional[int] = None,
raise_exceptions: bool = True,
column_map: t.Dict[str, str] = {},
) -> Result:
BaseRagasLLM
@dataclass
class BaseRagasLLM(ABC):
@abstractmethod
def generate_text(
self,
prompt: Prompt,
n: int = 1,
temperature: float = 1e-8,
stop: t.Optional[t.List[str]] = None,
callbacks: t.Optional[Callbacks] = None,
) -> LLMResult:
...
@abstractmethod
async def agenerate_text(
self,
prompt: Prompt,
n: int = 1,
temperature: float = 1e-8,
stop: t.Optional[t.List[str]] = None,
callbacks: t.Optional[Callbacks] = None,
) -> LLMResult:
...
list of issues this will address
- #387
- #383
- #271
- #303
- #330
- #376
- #343
- #375
- #282
- #369
- #367
- #108 removing HF dataset
- #286
- #413
- #414
Make embeddings faster
- #361
Hey @jjmachan ! Thanks for all your work on ragas, I really appreciate it. I am trying to use it to evaluate my chatbot created with llama-index. Has there been any workarounds discovered for issue #271 ?
These are my dependencies: `%pip install ragas==0.0.22
%pip install pypdf
%pip install llama-index==0.8.52
%pip install langchain==0.0.331rc3
%pip install openai==0.28.1`