ragas icon indicating copy to clipboard operation
ragas copied to clipboard

[RFC] Executor: making Ragas faster and more reliable

Open jjmachan opened this issue 2 years ago • 2 comments

Problem - ragas is slow and unreliable

  1. Ragas is not exploiting concurrency options provided via ThreadPoolExecutor and asyncio modules. This is because ragas took a batching approach to evaluation ie evaluated metrics in batches
  2. Not every service has async support - need options to keep sync and no concurrency at all
  3. need these primitives for #380 and potentially others as well

Core Components

  1. BaseMetric - a metric that evaluates a single score row with but score() and ascore()
  2. RagasLLM that is based on langchain-core llms
    1. Prompt object with provision for instruction and demonstrations that convert to messages or prompts that is supported by both langchain chat based on completion based
    2. LLMResult object that supports both chat and text-based outputs
  3. Exector that runs BaseMetric . It should also be able to run testset generators so this should be a common paradigm
  4. new evaluate() function that makes it easier to
    1. change llm and embeddings - this is the new method where BaseMetrc by default will have llm=None and will take the default llm from the evaluate() function. If metric.llm != None then the provided metric is used
    2. switch between async vs threading
    3. supports callbacks throughout

Base classes

Metric

class Metric:
    def score(
      row, # just 1 row
      callbacks: t.Optional[Callbacks] = None,
    )-> float:
		    ...
    async def ascore(
        row, # just 1 row
        callbacks: t.Optional[Callbacks] = None,
    )-> float:
        ...

evaluation()

def evaluate(
    dataset: Dataset,
    metrics: list[Metric] | None = None,
    llm: t.Optional[BaseRagasLLM] = None,
    embeddings: t.Optional[RagasEmbeddings] = None,
    callbacks: Callbacks = [],
    is_async: bool = True,
    max_workers: t.Optional[int] = None,
    raise_exceptions: bool = True,
    column_map: t.Dict[str, str] = {},
) -> Result:

BaseRagasLLM

@dataclass
class BaseRagasLLM(ABC):
    @abstractmethod
    def generate_text(
        self,
        prompt: Prompt,
        n: int = 1,
        temperature: float = 1e-8,
        stop: t.Optional[t.List[str]] = None,
        callbacks: t.Optional[Callbacks] = None,
    ) -> LLMResult:
        ...

    @abstractmethod
    async def agenerate_text(
        self,
        prompt: Prompt,
        n: int = 1,
        temperature: float = 1e-8,
        stop: t.Optional[t.List[str]] = None,
        callbacks: t.Optional[Callbacks] = None,
    ) -> LLMResult:
        ...

jjmachan avatar Dec 19 '23 14:12 jjmachan

list of issues this will address

  • #387
  • #383
  • #271
  • #303
  • #330
  • #376
  • #343
  • #375
  • #282
  • #369
  • #367
  • #108 removing HF dataset
  • #286
  • #413
  • #414

Make embeddings faster

  • #361

jjmachan avatar Jan 08 '24 10:01 jjmachan

Hey @jjmachan ! Thanks for all your work on ragas, I really appreciate it. I am trying to use it to evaluate my chatbot created with llama-index. Has there been any workarounds discovered for issue #271 ?

These are my dependencies: `%pip install ragas==0.0.22

%pip install pypdf

%pip install llama-index==0.8.52

%pip install langchain==0.0.331rc3

%pip install openai==0.28.1`

iterakhtaras avatar Mar 28 '24 14:03 iterakhtaras