ragas icon indicating copy to clipboard operation
ragas copied to clipboard

RAGAS with huggingface models

Open SalvatoreRa opened this issue 1 year ago • 8 comments

Describe the bug A clear and concise description of what the bug is.

I tried using RAGAS with a model that is not OpenAI. In general whatever model I use I get this error back:

File /opt/conda/lib/python3.10/site-packages/ragas/evaluation.py:237, in evaluate(dataset, metrics, llm, embeddings, callbacks, in_ci, is_async, run_config, raise_exceptions, column_map)
    235 results = executor.results()
    236 if results == []:
--> 237     raise ExceptionInRunner()
    239 # convert results to dataset_like
    240 for i, _ in enumerate(dataset):

ExceptionInRunner: The runner thread which was running the jobs raised an exeception. Read the traceback above to debug it. You can also pass raise_exceptions=False incase you want to show only a warning message instead.

/opt/conda/lib/python3.10/site-packages/ipykernel/iostream.py:123: RuntimeWarning: coroutine 'as_completed.<locals>.sema_coro' was never awaited
  await self._event_pipe_gc()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

Which I solved using this:

import nest_asyncio
nest_asyncio.apply()

However, it is not returning error but it is returning: {'faithfulness': nan, 'answer_relevancy': nan, 'context_utilization': nan}

Code to Reproduce

import pandas as pd
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
from sentence_transformers import SentenceTransformer
from langchain import HuggingFacePipeline
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
    context_utilization
)
from ragas import evaluate
from datasets import Dataset

import nest_asyncio
nest_asyncio.apply()

# embedding model
embedding_model = SentenceTransformer("microsoft/mpnet-base")

# evaluator
model_id = "mistralai/Mistral-7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

device = 0  # Use GPU (0 is typically the first GPU device)

pipe = pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    temperature=0.1,
    do_sample=True,
    max_new_tokens = 200,
    repetition_penalty=1.1  # without this output begins repeating

)

evaluator = HuggingFacePipeline(pipeline=pipe)

data_samples = {
    'question': ['When was the first super bowl?', 'Who won the most super bowls?'],
    'answer': ['The first superbowl was held on Jan 15, 1967', 'The most super bowls have been won by The New England Patriots'],
    'contexts' : [['The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles,'], 
    ['The Green Bay Packers...Green Bay, Wisconsin.','The Packers compete...Football Conference']],
}
dataset = Dataset.from_dict(data_samples)

# ragas
result = evaluate(
    dataset=dataset,
    llm=evaluator,
    embeddings=embedding_model,
    raise_exceptions=False,
    metrics=[
        faithfulness,
        answer_relevancy,
        context_utilization,
    ]
)

print(result)

Error trace No error, but basically is not working Expected behavior It should return the evaluation metrics

Thank you very much for your help

SalvatoreRa avatar Jul 11 '24 13:07 SalvatoreRa

Hey there, @SalvatoreRa! I'm Dosu, a bot here to assist you with any bugs, questions, or contributions. I'm here to help while you wait for a human maintainer. Let's tackle this issue together!

To resolve the issue of receiving nan for all evaluation metrics when using RAGAS with a HuggingFace model, ensure that the dataset is correctly formatted and all required columns are present and correctly named. The dataset should have the following columns: question, ground_truth, answer, and contexts. Additionally, make sure that the column names match the expected names or provide a column_map to map your dataset's column names to the expected ones.

Here is an example of how to use the evaluate function with a properly formatted dataset:

from ragas import evaluate
from datasets import Dataset

# Example dataset
data = {
    'question': ["What is the capital of France?", "Who wrote '1984'?"],
    'ground_truth': [["Paris"], ["George Orwell"]],
    'answer': ["Paris", "George Orwell"],
    'contexts': [["Paris is the capital of France."], ["George Orwell wrote '1984'."]]
}

dataset = Dataset.from_dict(data)

# Evaluate the dataset
result = evaluate(dataset)
print(result)

Ensure that the dataset columns are correctly named or use the column_map parameter to map your dataset's column names to the expected ones:

column_map = {
    'question': 'your_question_column',
    'ground_truth': 'your_ground_truth_column',
    'answer': 'your_answer_column',
    'contexts': 'your_contexts_column'
}

result = evaluate(dataset, column_map=column_map)
print(result)

If the dataset is correctly formatted and the issue persists, check if the metrics, LLM, and embeddings are properly initialized and not None [1].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Jul 11 '24 14:07 dosubot[bot]

I have tested with the mapping, but the results are the same. The dataset is already in the good format:

amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")

SalvatoreRa avatar Jul 12 '24 07:07 SalvatoreRa

U have to downgrade ur ragas version to 0.1.7 then it should be working

TM02 avatar Jul 22 '24 06:07 TM02

I have tried, looks like it takes a lot of time and then just go out of memory. I tried this script in Google Colab:

!pip install datasets sentence_transformers langchain ragas==0.1.7 accelerate bitsandbytes langchain-huggingface
from datasets import load_dataset
import pandas as pd
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
from sentence_transformers import SentenceTransformer
from langchain import HuggingFacePipeline
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
    context_utilization
)
from ragas import evaluate
from datasets import Dataset

import nest_asyncio
nest_asyncio.apply()

# embedding model
embedding_model = SentenceTransformer('all-MiniLM-L6-v2', device='cuda')

model_id = "microsoft/Phi-3-mini-4k-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_4bit=True,
    device_map="auto",  # Automatically map layers to GPU(s)
    #attn_implementation="flash_attention_2", # if you have an ampere GPU
)

# Define the text generation pipeline using HuggingFace transformers
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=500, top_k=50, temperature=0.1)  # Use GPU (0 is typically the first GPU device)

# Wrap the pipeline in a HuggingFacePipeline object
model  = HuggingFacePipeline(pipeline=pipe)



evaluator = HuggingFacePipeline(pipeline=pipe)

amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")
dataset = amnesty_qa["eval"].select(range(5))

column_map = {
    'question': 'question',
    'ground_truth': 'ground_truth',
    'answer': 'answer',
    'contexts': 'contexts'
}

# ragas
result = evaluate(
    dataset=dataset,
    llm=evaluator,
    embeddings=embedding_model,
    raise_exceptions=False,
    column_map=column_map,
    metrics=[
        faithfulness,
        answer_relevancy,
        context_utilization,
    ]
)

print(result)

SalvatoreRa avatar Jul 23 '24 07:07 SalvatoreRa

Ragas with huggingface models don't work as indented because they don't work with async as we expect. could you try out ollama or vllm?

jjmachan avatar Jul 31 '24 07:07 jjmachan

Ragas with huggingface models don't work as indented because they don't work with async as we expect. could you try out ollama or vllm?

I actually tried before this but I did not work: https://github.com/mosh98/RAG_With_Models/blob/main/evaluation/RAGAS%20DEMO.ipynb

do you have in case some code snippet?

SalvatoreRa avatar Jul 31 '24 09:07 SalvatoreRa

@SalvatoreRa in that example it seems working right? or did I missing something?

jjmachan avatar Aug 06 '24 06:08 jjmachan

Notebook does not work and also the notebook does not have any versions mentioned for packages. Can we atleast have a req.txt file for examples?

arjunsridhar9720 avatar Nov 11 '25 20:11 arjunsridhar9720