ragas Handle large context windows using Ollama's LLMs for evaluation purpose

[/ ] I checked the documentation and related resources and couldn't find an answer to my question.

I integrated Langfuse with Ragas for evaluation purposes. Referring to issue #53, it's mentioned that by default, "we are using 'gpt-3.5-16k', so you don't have to reduce document size when running the evals." However, when using different Ollama LLMs like llama3, mistral, phi3..., I noticed that Ragas limits the context window to between 500-700 tokens. I want it to handle larger document sizes, similar to 'gpt-3.5-16k'. Is this possible?

# setup ragas evaluation ----------------------------------------------------

# import metrics
from ragas.metrics import faithfulness, answer_relevancy, context_precision
from ragas.metrics.critique import SUPPORTED_ASPECTS, harmfulness
from ragas.run_config import RunConfig
from ragas.metrics.base import MetricWithLLM, MetricWithEmbeddings
from langchain_openai.chat_models import ChatOpenAI
from langchain_openai.embeddings import OpenAIEmbeddings

# wrappers
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper

from langchain_community.chat_models import ChatOllama
from ragas import evaluate
from langchain_community.embeddings import OllamaEmbeddings
# information found here: https://docs.ragas.io/en/latest/howtos/customisations/bring-your-own-llm-or-embs.html

# metrics you chose
metrics = [answer_relevancy]

# util function to init Ragas Metrics
def init_ragas_metrics(metrics, llm, embedding):
    for metric in metrics:
        if isinstance(metric, MetricWithLLM):
            metric.llm = llm
        if isinstance(metric, MetricWithEmbeddings):
            metric.embeddings = embedding
        run_config = RunConfig()
        metric.init(run_config)

llm = ChatOllama(model="deepseek-coder-v2:latest")
emb = OllamaEmbeddings(model="nomic-embed-text")

# llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
# emb = OpenAIEmbeddings()

init_ragas_metrics(
    metrics,
    llm=LangchainLLMWrapper(llm),
    embedding=LangchainEmbeddingsWrapper(emb),
)

async def score_with_ragas(query, chunks, answer):
    scores = {}
    for m in metrics:
        print(f"calculating {m.name}")
        scores[m.name] = await m.ascore(   # call each metric's ascore, pass dict with question, context, and answer(calculation here)
            row={"question": query, "contexts": chunks, "answer": answer}  # result stored in 'scores' dict with the metric's name as key
        )
    return scores

from langchain_core.runnables import RunnableLambda

# Define a global variable to store the context
context_retrieved = None

def inspect(state):
    """Print the state passed between Runnables in a langchain and pass it on"""
    global context_retrieved
    context_retrieved = state.get("context")
    print(state)
    return state

chain = (
    RunnableParallel(
        {
            "context": _search_query | retriever,
            "question": RunnablePassthrough(),
        }
    )
    | RunnableLambda(inspect)
    | prompt
    | llm2
    | StrOutputParser()
)

import pprint

user_question = "What happened to Alice after she ate the cake?"
answer = chain.invoke({"question": user_question})
pprint.pprint(answer)


context_retrieved = '''
She ate a little bit, and said anxiously to herself, “Which way? Which
way?”, holding her hand on the top of her head to feel which way it was
growing, and she was quite surprised to find that she remained the same
size: to be sure, this generally happens when one eats cake, but Alice
had got so much into the way of expecting nothing but out-of-the-way
things to happen, that it seemed quite dull and stupid for life to go
on in the common way.
'''

context_red_retrieved = context_retrieved[:700]
question, contexts, answer = user_question, context_red_retrieved, answer
eval_ans = await score_with_ragas(question, contexts, answer) 
print(eval_ans)

Jul 22 '24 06:07 TM02

Hey there, @TM02! 👋 I'm here to help you with any bugs, questions, or becoming a contributor. I'll assist you while we wait for a human maintainer to join.

To configure Ragas to handle larger context windows with Ollama's LLMs, you need to adjust the max_tokens parameter when initializing your custom LLM. Here's how you can do it:

Set the max_tokens parameter in your custom LLM initialization to handle larger context windows:

from langchain_community.chat_models import ChatOllama
from langchain_community.embeddings import OllamaEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper

llm = ChatOllama(model="deepseek-coder-v2:latest", max_tokens=16000)  # Adjust max_tokens as needed
emb = OllamaEmbeddings(model="nomic-embed-text")

init_ragas_metrics(
    metrics,
    llm=LangchainLLMWrapper(llm),
    embedding=LangchainEmbeddingsWrapper(emb),
)

Ensure your LangchainLLMWrapper and LangchainEmbeddingsWrapper are correctly set up to use these custom models:

from ragas.metrics import answer_relevancy
from ragas.run_config import RunConfig
from ragas.metrics.base import MetricWithLLM, MetricWithEmbeddings

metrics = [answer_relevancy]

def init_ragas_metrics(metrics, llm, embedding):
    for metric in metrics:
        if isinstance(metric, MetricWithLLM):
            metric.llm = llm
        if isinstance(metric, MetricWithEmbeddings):
            metric.embeddings = embedding
        run_config = RunConfig()
        metric.init(run_config)

llm = ChatOllama(model="deepseek-coder-v2:latest", max_tokens=16000)  # Adjust max_tokens as needed
emb = OllamaEmbeddings(model="nomic-embed-text")

init_ragas_metrics(
    metrics,
    llm=LangchainLLMWrapper(llm),
    embedding=LangchainEmbeddingsWrapper(emb),
)

By setting the max_tokens parameter to a higher value, you can ensure that Ragas handles larger context windows similar to 'gpt-3.5-16k' ^[1].

To continue talking to Dosu, mention @dosu.

Jul 22 '24 06:07 dosubot[bot]

@dosu

I tried your solution, but it still doesn't work. Perhaps it's due to the model's context window limitation or because Ragas only allows 'gpt-3.5-16k' to handle larger document sizes without reduction. When I exceed Ragas' context window limit, it keeps showing 'nan'.

Jul 22 '24 07:07 TM02

To handle larger document sizes with Ollama's LLMs and resolve the 'nan' issue when exceeding the context window limit, you can use the LangchainLLMWrapper class with a properly configured RunConfig. Here's how you can do it:

Configure the LangchainLLMWrapper with a larger context window:

from ragas.run_config import RunConfig
from langchain_community.chat_models import ChatOllama
from ragas.llms.base import LangchainLLMWrapper

# Create a RunConfig with a larger context window
run_config = RunConfig(timeout=60)  # Adjust timeout as needed

# Initialize the language model with the desired model name and context size
ollama_model = ChatOllama(model="deepseek-coder-v2:latest", max_tokens=16000)  # Adjust max_tokens as needed

# Wrap the language model with LangchainLLMWrapper
llm_wrapper = LangchainLLMWrapper(ollama_model, run_config)

# Now you can use llm_wrapper to generate text with the larger context window

Ensure your metrics are initialized with the configured LLM:

from ragas.metrics import answer_relevancy
from ragas.metrics.base import MetricWithLLM, MetricWithEmbeddings
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_community.embeddings import OllamaEmbeddings

metrics = [answer_relevancy]

def init_ragas_metrics(metrics, llm, embedding):
    for metric in metrics:
        if isinstance(metric, MetricWithLLM):
            metric.llm = llm
        if isinstance(metric, MetricWithEmbeddings):
            metric.embeddings = embedding
        run_config = RunConfig()
        metric.init(run_config)

emb = OllamaEmbeddings(model="nomic-embed-text")

init_ragas_metrics(
    metrics,
    llm=llm_wrapper,
    embedding=LangchainEmbeddingsWrapper(emb),
)

Handle the 'nan' issue by ensuring proper error handling in the output parser:

The RagasoutputParser class retries parsing if an OutputParserException is encountered. Ensure this mechanism is in place:

class RagasoutputParser(PydanticOutputParser):
    async def aparse(self, result: str, prompt: PromptValue, llm: BaseRagasLLM, max_retries: int = 1):
        try:
            output = super().parse(result)
        except OutputParserException:
            if max_retries != 0:
                p_value = FIX_OUTPUT_FORMAT.format(prompt=prompt.to_string(), completion=result)
                output = await llm.generate(p_value)
                result = output.generations[0][0].text
                return await self.aparse(result, prompt, llm, max_retries - 1)
            else:
                logger.warning("Failed to parse output. Returning None.")
                return None
        return output

By configuring the LangchainLLMWrapper with a larger context window and ensuring proper error handling, you should be able to handle larger document sizes and mitigate the 'nan' issue ^[1]^[2].

To continue talking to Dosu, mention @dosu.

Jul 22 '24 07:07 dosubot[bot]

It still appeared the same error

Jul 22 '24 07:07 TM02

@TM02 this could be an issue with the models context length but it is recommended to use a better model for eval so that you can trust the model graded results

Jul 30 '24 07:07 jjmachan