AttributeError
i deifned this function to not use the LLM calls but it still give me this error:
Exception has occurred: AttributeError
'list' object has no attribute 'get_sample_type'
File "C:\Users\Amin\OneDrive - unige.it\Documenti\projectss\ragas-prototype\src\rag_eva.py", line 222, in
my code is this:
Define the custom LLM class
class CustomRagasLLM(BaseRagasLLM): def init(self, api_key: str = None): """ Initialize the custom LLM, optionally using an API key if necessary. """ self.api_key = api_key
def _call(self, prompt: str) -> str:
"""
Process the prompt and return a result. This can be customized to
use a local model or perform any required logic.
"""
if not self.api_key:
return f"Processed: {prompt} (without API key)"
else:
# Handle LLM response if using an API
return f"Processed: {prompt} (with API key: {self.api_key})"
def generate_text(
self,
prompt: PromptValue,
n: int = 1,
temperature: float = 1e-8,
stop: t.Optional[t.List[str]] = None,
callbacks: t.List = []
) -> LLMResult:
# Synchronous generation logic
text = self._call(prompt)
return LLMResult(generations=[Generation(text=text)])
async def agenerate_text(
self,
prompt: PromptValue,
n: int = 1,
temperature: float = 1e-8,
stop: t.Optional[t.List[str]] = None,
callbacks: t.List = []
) -> LLMResult:
"""
Asynchronous method to generate text. This should allow for async processing.
"""
# Simulate an asynchronous call, here we directly call the sync method for now
text =await self._call(prompt)
return LLMResult(generations=[Generation(text=text)])
To fix the AttributeError: 'list' object has no attribute 'get_sample_type', ensure that you are using an EvaluationDataset instance instead of a list when calling the evaluate function. Here's how you can create an EvaluationDataset correctly:
-
Convert your list to an
EvaluationDataset:If you have a list of samples, you need to convert it to an
EvaluationDataset:from ragas import EvaluationDataset # Assuming ragas_data is your list of samples evaluation_dataset = EvaluationDataset.from_list(ragas_data) # Use evaluation_dataset in the evaluate function evaluation_report = evaluate(evaluation_dataset, metrics=metrics, embeddings=ragas_embeddings, llm=custom_llm) -
Ensure your samples are structured correctly:
Make sure each sample in your list is an instance of
SingleTurnSampleorMultiTurnSample, as required by theEvaluationDataset[1][2].
By following these steps, you should be able to resolve the error and proceed with your evaluation.
To continue talking to Dosu, mention @dosu.
@dosu.
how to fix this error:
Exception has occurred: AttributeError
'list' object has no attribute 'rename_columns'
File "C:\Users\Amin\git\ragas-prototype\src\rag_eva.py", line 196, in
To fix the AttributeError: 'list' object has no attribute 'rename_columns', you need to convert your list to a compatible data structure that supports the rename_columns method. Here's how you can do it:
-
Convert your list to an
EvaluationDataset:If you have a list of samples, convert it to an
EvaluationDataset:from ragas import EvaluationDataset # Assuming ragas_data is your list of samples evaluation_dataset = EvaluationDataset.from_list(ragas_data) # Use evaluation_dataset in the evaluate function evaluation_report = evaluate(evaluation_dataset, metrics=metrics, embeddings=ragas_embeddings, llm=custom_llm)
Ensure that each sample in your list is structured correctly as required by the EvaluationDataset [1][2].
To continue talking to Dosu, mention @dosu.
@dosu. but this does not exist in the Ragas library: EvaluationDataset. which version of ragas are you using?
The EvaluationDataset class might not exist in the current version of the Ragas library. Instead, you can use the available methods to handle datasets or lists of samples. Here are some relevant methods:
from_list(): Creates a dataset from a list of dictionaries.to_pandas(): Converts samples to a Pandas DataFrame.to_hf_dataset(): Converts samples to a Hugging Face dataset.
Ensure your samples are instances of SingleTurnSample or MultiTurnSample for compatibility [1][2].
To continue talking to Dosu, mention @dosu.
@dosu. i face this error : AttributeError: 'list' object has no attribute 'rename_columns' and my code is this:
Assuming data_set is a list of dictionaries
ragas_data = [ { "question": entry["text_vector_1"], # Assuming this is a list of strings "answer": entry["text_vector_2"], # Assuming this is a list of strings "contexts": entry["chunks"] # Assuming this is a list of lists of strings } for entry in data_set ]
Create the required structure for Dataset
formatted_data = { "question": [entry["question"] for entry in ragas_data], "contexts": [entry["contexts"] for entry in ragas_data], "answer": [entry["answer"] for entry in ragas_data] }
the important thing is that I have a list of floats(computed embeddings).
To resolve the AttributeError: 'list' object has no attribute 'rename_columns', you need to convert your list of dictionaries into a compatible dataset format that supports the rename_columns method. Here's how you can do it using the from_list method of the EvaluationDataset class:
-
Convert your list to an
EvaluationDataset:First, ensure your data is structured correctly and then convert it:
from ragas import EvaluationDataset # Assuming ragas_data is your list of dictionaries evaluation_dataset = EvaluationDataset.from_list(ragas_data) # Now you can use evaluation_dataset in the evaluate function evaluation_report = evaluate(evaluation_dataset, metrics=metrics, embeddings=ragas_embeddings, llm=custom_llm) -
Ensure your data structure is correct:
Make sure each dictionary in your list has the correct keys and values as required by the
EvaluationDataset:ragas_data = [ { "question": entry["text_vector_1"], "answer": entry["text_vector_2"], "contexts": entry["chunks"] } for entry in data_set ]
This should help you avoid the AttributeError by using a compatible dataset structure [1].
To continue talking to Dosu, mention @dosu.
@dosu. how to fix this error: Exception has occurred: AttributeError 'list' object has no attribute 'rename_columns' File "C:\Users\Amin\git\ragas-prototype\src\rag_eva.py", line 196, in evaluation_report = evaluate(ragas_data, metrics=metrics, embeddings=ragas_embeddings,llm=custom_llm) # Pass your custom LLM here ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'list' object has no attribute 'rename_columns'
I have a similar issue when using Pandas DataFrame.
Error:
Traceback (most recent call last):
File "/Users/tobiasmolland/GitHub/Larry-Llama/scripts/ragas_generate_testset.py", line 49, in
Code used:
# import dependencies
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
# set models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()
# create generator with models
generator = TestsetGenerator.from_langchain(
generator_llm,
critic_llm,
embeddings
)
# generate testset
testset = generator.generate_with_langchain_docs(documents, test_size=5, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})
df = testset.to_pandas()
# evaluate testset
from ragas.metrics import (
answer_relevancy,
faithfulness,
context_recall,
context_precision,
)
from ragas import evaluate
result = evaluate(
df,
metrics=[
context_precision,
faithfulness,
answer_relevancy,
context_recall,
],
)
result
Any solution to this yet?
no, I changed the format multiple times but it still gives me the error. I will share this with you if I find any solution.
To resolve the AttributeError: 'DataFrame' object has no attribute 'rename_columns', you need to convert your Pandas DataFrame to a format that supports the rename_columns method. Here's how you can do it:
-
Convert the DataFrame to a compatible dataset:
Use the
from_pandasmethod if available, or convert it to a list of dictionaries and then to a compatible dataset format:from ragas import EvaluationDataset # Convert DataFrame to a list of dictionaries ragas_data = df.to_dict(orient='records') # Create an EvaluationDataset evaluation_dataset = EvaluationDataset.from_list(ragas_data) # Use evaluation_dataset in the evaluate function result = evaluate( evaluation_dataset, metrics=[ context_precision, faithfulness, answer_relevancy, context_recall, ], ) -
Ensure your data structure is correct:
Make sure each dictionary in your list has the correct keys and values as required by the
EvaluationDataset.
This approach should help you avoid the AttributeError by using a compatible dataset structure. If EvaluationDataset is not available, ensure you are using the correct version of the library or check for any updates that might include this functionality.
To continue talking to Dosu, mention @dosu.
@dosu. the EvaluationDataset does not exist in my ragas library.
If the EvaluationDataset class is not available in your version of the Ragas library, you can handle datasets using alternative methods. Here’s how you can proceed:
-
Convert Your Data:
- If you have a list of dictionaries, you can convert it to a format compatible with the evaluation functions. Use methods like
to_hf_dataset()if available.
- If you have a list of dictionaries, you can convert it to a format compatible with the evaluation functions. Use methods like
-
Ensure Correct Structure:
- Make sure your data is structured correctly with keys and values that match the expected input for evaluation functions.
-
Use Available Methods:
- Check for methods like
from_list()orto_pandas()to convert your data into a compatible format.
- Check for methods like
If these methods are not available, consider checking the library documentation or updating to a version that includes these features.
To continue talking to Dosu, mention @dosu.
@dosu. this part of my code is related to my data set. take a look at it and tell me which part of it requires changes.
Assuming data_set is a list of dictionaries
ragas_data = [ { "question": entry["text_vector_1"], # Assuming this is a list of strings "answer": entry["text_vector_2"], # Assuming this is a list of strings "contexts": entry["chunks"] # Assuming this is a list of lists of strings } for entry in data_set ]
Create the required structure for Dataset
formatted_data = { "question": [entry["question"] for entry in ragas_data], "contexts": [entry["contexts"] for entry in ragas_data], "answer": [entry["answer"] for entry in ragas_data] }
Create a Dataset using the Hugging Face datasets library
ragas_dataset = Dataset.from_dict(formatted_data)
To ensure your dataset is compatible with the Ragas evaluation framework, you need to make sure it matches the expected input structure. Here's how you can adjust your code:
-
Ensure Correct Structure:
- The
question,answer, andcontextsfields should be structured as follows:question: A string.answer: A string.contexts: A list of strings.
- The
-
Convert to Compatible Format:
- If you're using the Hugging Face
datasetslibrary, ensure the dataset is structured correctly before passing it to theevaluatefunction.
- If you're using the Hugging Face
Here's a revised version of your code:
from datasets import Dataset
# Assuming data_set is a list of dictionaries
ragas_data = [
{
"question": entry["text_vector_1"], # Should be a string
"answer": entry["text_vector_2"], # Should be a string
"contexts": entry["chunks"] # Should be a list of strings
}
for entry in data_set
]
# Create the required structure for Dataset
formatted_data = {
"question": [entry["question"] for entry in ragas_data],
"contexts": [entry["contexts"] for entry in ragas_data],
"answer": [entry["answer"] for entry in ragas_data]
}
# Create a Dataset using the Hugging Face `datasets` library
ragas_dataset = Dataset.from_dict(formatted_data)
# Ensure the dataset is compatible with the evaluate function
# You may need to convert it to a format that the evaluate function expects
Make sure each entry in ragas_data has the correct types for question, answer, and contexts. If the evaluate function requires a specific dataset type, you might need to convert it accordingly.
To continue talking to Dosu, mention @dosu.
@dosu.
i see this error:
Exception has occurred: ValueError
Dataset feature "question" should be of type string
File "C:\Users\Amin\git\ragas-prototype\src\rag_eva.py", line 199, in
the fromat is like this: formatted_data = { "question": [entry["question"] for entry in ragas_data], "contexts": [entry["contexts"] for entry in ragas_data], "answer": [entry["answer"] for entry in ragas_data] }
Create a Dataset using the Hugging Face datasets library
ragas_dataset = Dataset.from_dict(formatted_data)
Ensure that all elements in the question list are strings. Here's how you can check and fix this:
-
Verify Data Types: Make sure each
entry["question"]is a string. If it's not, convert it to a string. -
Update Your Code: Modify the list comprehension to ensure all questions are strings:
formatted_data = { "question": [str(entry["question"]) for entry in ragas_data], "contexts": [entry["contexts"] for entry in ragas_data], "answer": [entry["answer"] for entry in ragas_data] } -
Check the Source Data: Ensure that the source data (
data_set) has the correct types for each field.
By ensuring all question entries are strings, you should resolve the ValueError [1].
To continue talking to Dosu, mention @dosu.
@dosu. but i have floats not the str, i have the embeddings for question and answer and also the chunks
To handle embeddings (floats) for the question and answer fields in the Ragas evaluation framework, you need to ensure compatibility by using the column_map parameter in the evaluate function. Here's how you can do it:
-
Use the
column_mapParameter: Map your embedding columns to the expected names. -
Set Up Your Code: Ensure you have an appropriate embedding handler.
Here's an example:
from ragas import evaluate
from ragas.embeddings.base import BaseRagasEmbeddings
# Assuming `my_embeddings` is an instance of BaseRagasEmbeddings or a compatible class
# and `ragas_dataset` is your dataset with embeddings
column_map = {
"question": "question_embeddings",
"answer": "answer_embeddings"
}
result = evaluate(
dataset=ragas_dataset,
embeddings=my_embeddings,
column_map=column_map
)
print(result)
Make sure your dataset columns are correctly mapped and that you have the necessary embedding handler set up [1].
To continue talking to Dosu, mention @dosu.
@dosu how to adapt or create a custom evaluation logic that directly supports embeddings.
To create a custom evaluation logic that directly supports embeddings in the Ragas framework, you can use the evaluate function with custom embeddings. Here's how:
-
Wrap Your Embeddings: Use
LangchainEmbeddingsWrapperto wrap your custom embeddings. -
Pass to Evaluate Function: Pass the wrapped embeddings to the
evaluatefunction.
Here's a code snippet:
from ragas.embeddings.base import LangchainEmbeddingsWrapper
from ragas.evaluation import evaluate
from datasets import Dataset
# Assuming `my_embeddings` is your custom embeddings object
wrapped_embeddings = LangchainEmbeddingsWrapper(my_embeddings)
# Create your dataset
dataset = Dataset.from_dict({
"question": ["What is AI?", "Explain machine learning."],
"ground_truth": ["Artificial Intelligence is...", "Machine learning is a subset of AI..."]
})
# Define your metrics
metrics = [...] # List of Metric objects
# Call the evaluate function
results = evaluate(
dataset=dataset,
metrics=metrics,
embeddings=wrapped_embeddings
)
print(results)
This setup allows you to integrate custom embeddings into the evaluation process directly [1].
To continue talking to Dosu, mention @dosu.
@dosu but i wrote these lines of codes, what is the point of it? what should I do with them now? class CustomHuggingFaceRagasEmbeddings(BaseRagasEmbeddings): def init(self, model_name: str, custom_embeddings: list = None): """ Initialize the Custom Hugging Face Ragas Embeddings with the specified model and custom embeddings.
Parameters:
model_name (str): The name of the Hugging Face model to use (e.g., 'distilbert-base-uncased').
custom_embeddings (list): A list of pre-computed custom embeddings (optional).
"""
model_name = 'distilbert-base-uncased'
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModel.from_pretrained(model_name)
self.custom_embeddings = custom_embeddings # Store the custom embeddings
def embed_documents(self, texts: list) -> np.ndarray:
"""
Generate embeddings for a list of documents.
Parameters:
texts (list): A list of documents to embed.
Returns:
np.ndarray: An array of embeddings for the documents.
"""
if self.custom_embeddings is not None:
# If custom embeddings are provided, return those instead
return np.array(self.custom_embeddings)
# Generate new embeddings using the model if no custom embeddings are available
inputs = self.tokenizer(texts, return_tensors='pt', padding=True, truncation=True)
with torch.no_grad():
outputs = self.model(**inputs)
# Use the pooled output or the CLS token as the embedding
embeddings = outputs.last_hidden_state[:, 0, :] # CLS token for sentence embedding
return embeddings.numpy() # Convert to NumPy array
def embed_query(self, query: str) -> np.ndarray:
"""
Generate an embedding for a single query.
Parameters:
query (str): The query to embed.
Returns:
np.ndarray: The embedding for the query.
"""
# If custom embeddings are provided, generate embedding based on those
if self.custom_embeddings is not None:
# You might want to handle how to relate the query to your custom embeddings
raise NotImplementedError("Custom query embeddings are not supported with provided custom embeddings.")
# Generate a new embedding using the model
inputs = self.tokenizer(query, return_tensors='pt', padding=True, truncation=True)
with torch.no_grad():
outputs = self.model(**inputs)
# Use the pooled output or the CLS token as the embedding
embedding = outputs.last_hidden_state[:, 0, :] # CLS token for single query embedding
return embedding.numpy() # Convert to NumPy array
Initialize the custom embeddings class
custom_embeddings = CustomHuggingFaceRagasEmbeddings(ragas_dataset)
ragas_embeddings = CustomHuggingFaceRagasEmbeddings(model_name=model_name, custom_embeddings=custom_embeddings)
The code you wrote defines a custom embeddings class, CustomHuggingFaceRagasEmbeddings, which integrates Hugging Face models to generate or use pre-computed embeddings. Here's how you can use it:
-
Purpose of the Code:
- Custom Embeddings: It allows you to use pre-computed embeddings if available, or generate new ones using a specified Hugging Face model.
- Flexibility: You can switch between using pre-computed embeddings and generating new ones based on your needs.
-
Next Steps:
- Instantiate the Class: Ensure you initialize the class with the correct parameters. If you have pre-computed embeddings, pass them to the
custom_embeddingsparameter. - Use in Evaluation: Pass an instance of this class to the
evaluatefunction as theembeddingsparameter.
- Instantiate the Class: Ensure you initialize the class with the correct parameters. If you have pre-computed embeddings, pass them to the
-
Example Usage:
# Initialize with a model name and optional custom embeddings custom_embeddings = CustomHuggingFaceRagasEmbeddings( model_name='distilbert-base-uncased', custom_embeddings=your_precomputed_embeddings # Replace with your embeddings if available ) # Use in the evaluate function result = evaluate( dataset=ragas_dataset, metrics=metrics, embeddings=custom_embeddings ) print(result)
Ensure that your dataset and metrics are correctly set up to work with the embeddings. This setup allows you to leverage both pre-computed and dynamically generated embeddings in your evaluation process.
To continue talking to Dosu, mention @dosu.
@dosu @jjmachan you don't understand my point. I created a custom embedding class to use the custom evaluation function(that directly supports the embeddings). but when I run it I see the error "the question must be the string". this is my code:
Assuming data_set is a list of dictionaries
ragas_data = [ { "question": entry["text_vector_1"], # Assuming this is a list of strings "answer": entry["text_vector_2"], # Assuming this is a list of strings "contexts": entry["chunks"] # Assuming this is a list of lists of strings } for entry in data_set ]
Create the required structure that structures the data for the Hugging Face Dataset creation.
formatted_data = { "question": [entry["question"] for entry in ragas_data], "contexts": [entry["contexts"] for entry in ragas_data], "answer": [entry["answer"] for entry in ragas_data] }
Define the column_map to match custom columns to expected ones
column_map = { "question": "question", # Match to the correct column "answer": "answer", # Match to the correct column "contexts": "contexts" # Match to the correct column }
Create a Dataset using the Hugging Face datasets library
ragas_dataset = Dataset.from_dict(formatted_data)
model_name = 'distilbert-base-uncased'
class CustomHuggingFaceRagasEmbeddings(BaseRagasEmbeddings): def init(self, model_name: str, custom_embeddings: list = None): """ Initialize the Custom Hugging Face Ragas Embeddings with the specified model and custom embeddings.
Parameters:
model_name (str): The name of the Hugging Face model to use (e.g., 'distilbert-base-uncased').
custom_embeddings (list): A list of pre-computed custom embeddings (optional).
"""
model_name = 'distilbert-base-uncased'
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModel.from_pretrained(model_name)
self.custom_embeddings = custom_embeddings # Store the custom embeddings
def embed_documents(self, texts: list) -> np.ndarray:
"""
Generate embeddings for a list of documents.
Parameters:
texts (list): A list of documents to embed.
Returns:
np.ndarray: An array of embeddings for the documents.
"""
if self.custom_embeddings is not None:
# If custom embeddings are provided, return those instead
return np.array(self.custom_embeddings)
# Generate new embeddings using the model if no custom embeddings are available
inputs = self.tokenizer(texts, return_tensors='pt', padding=True, truncation=True)
with torch.no_grad():
outputs = self.model(**inputs)
# Use the pooled output or the CLS token as the embedding
embeddings = outputs.last_hidden_state[:, 0, :] # CLS token for sentence embedding
return embeddings.numpy() # Convert to NumPy array
def embed_query(self, query: str) -> np.ndarray:
"""
Generate an embedding for a single query.
Parameters:
query (str): The query to embed.
Returns:
np.ndarray: The embedding for the query.
"""
# If custom embeddings are provided, generate embedding based on those
if self.custom_embeddings is not None:
# You might want to handle how to relate the query to your custom embeddings
raise NotImplementedError("Custom query embeddings are not supported with provided custom embeddings.")
# Generate a new embedding using the model
inputs = self.tokenizer(query, return_tensors='pt', padding=True, truncation=True)
with torch.no_grad():
outputs = self.model(**inputs)
# Use the pooled output or the CLS token as the embedding
embedding = outputs.last_hidden_state[:, 0, :] # CLS token for single query embedding
return embedding.numpy() # Convert to NumPy array
Initialize the custom embeddings class
custom_embeddings = CustomHuggingFaceRagasEmbeddings(ragas_dataset)
ragas_embeddings = CustomHuggingFaceRagasEmbeddings(model_name=model_name, custom_embeddings=custom_embeddings)
Define the custom LLM class
class CustomRagasLLM(BaseRagasLLM): def init(self, api_key: str = None): """ Initialize the custom LLM, optionally using an API key if necessary. """ self.api_key = api_key
def _call(self, prompt: str) -> str:
"""
Process the prompt and return a result. This can be customized to
use a local model or perform any required logic.
"""
if not self.api_key:
return f"Processed: {prompt} (without API key)"
else:
# Handle LLM response if using an API
return f"Processed: {prompt} (with API key: {self.api_key})"
def generate_text(
self,
prompt: PromptValue,
n: int = 1,
temperature: float = 1e-8,
stop: t.Optional[t.List[str]] = None,
callbacks: t.List = []
) -> LLMResult:
# Synchronous generation logic
text = self._call(prompt)
return LLMResult(generations=[Generation(text=text)])
async def agenerate_text(
self,
prompt: PromptValue,
n: int = 1,
temperature: float = 1e-8,
stop: t.Optional[t.List[str]] = None,
callbacks: t.List = []
) -> LLMResult:
"""
Asynchronous method to generate text. This should allow for async processing.
"""
# Simulate an asynchronous call, here we directly call the sync method for now
text =await self._call(prompt)
return LLMResult(generations=[Generation(text=text)])
custom_llm =CustomRagasLLM(api_key=None)
Define the evaluation metrics
metrics = [context_utilization]
Create a Dataset using the Hugging Face datasets library
#ragas_dataset = Dataset.from_dict(formatted_data)
Run the evaluation
evaluation_report = evaluate(ragas_dataset, metrics=metrics, embeddings=ragas_embeddings,llm=custom_llm, column_map=column_map) # Pass your custom LLM here
print("RAGAS Evaluation Report:") print(evaluation_report)
To address the error "the question must be a string," you need to ensure that the question field in your dataset is a string, not an embedding or list of floats. Here's how you can adapt your setup:
-
Convert Embeddings to Strings: If your
questionfield contains embeddings, you need to convert them to a string representation suitable for evaluation. This might involve using a placeholder or a descriptive string. -
Modify the Dataset: Ensure that the
questionfield informatted_datais a string:formatted_data = { "question": [str(entry["question"]) for entry in ragas_data], # Convert to string "contexts": [entry["contexts"] for entry in ragas_data], "answer": [entry["answer"] for entry in ragas_data] } -
Use the
column_mapCorrectly: Ensure that thecolumn_mapis correctly mapping your dataset fields to the expected names in the evaluation function. -
Custom Embeddings Handling: If you want to use embeddings directly, ensure your evaluation logic can handle them. This might involve customizing the evaluation function to work with embeddings instead of strings.
By ensuring the question field is a string, you should resolve the error. If you need to evaluate using embeddings, consider adapting the evaluation logic to handle embeddings directly, possibly by modifying the evaluation function to accept and process embeddings.
To continue talking to Dosu, mention @dosu.
@dosu how can I Extend or Override the Evaluation Method? I want to create a custom evaluation function that bypasses text input and directly operates on embeddings. I want to Modify it so that it doesn't expect raw text but rather accepts embeddings as the primary data format. This requires modifying the parts of the evaluation pipeline that deal with text, and replacing them with operations on vectors.
To create a custom evaluation function that operates directly on embeddings, you can extend or override the existing evaluation method in the Ragas framework. Here's a step-by-step guide:
-
Modify the Evaluation Function: Create a custom function that processes embeddings instead of raw text.
-
Implement Custom Logic: Use the embeddings to perform your evaluation logic.
Here's an example of how you can achieve this:
from ragas.embeddings.base import BaseRagasEmbeddings
def custom_evaluate(
dataset,
metrics,
embeddings: BaseRagasEmbeddings,
raise_exceptions: bool = False,
column_map: t.Optional[t.Dict[str, str]] = None,
run_config: t.Optional[RunConfig] = None,
) -> Result:
if embeddings is None:
raise ValueError("Embeddings must be provided for custom evaluation!")
# Extract embeddings for questions and answers
query_embeddings = [embeddings.get_embedding(q) for q in dataset["question"]]
answer_embeddings = [embeddings.get_embedding(a) for a in dataset["answer"]]
# Implement your custom evaluation logic here
results = []
for qe, ae in zip(query_embeddings, answer_embeddings):
similarity = calculate_similarity(qe, ae) # Define this function as needed
results.append(similarity)
# Return results in the expected format
return results
# Example usage
# Ensure your dataset and embeddings are set up correctly
result = custom_evaluate(
dataset=your_dataset,
metrics=your_metrics,
embeddings=your_custom_embeddings
)
print(result)
Key Points:
- Embeddings Handling: Ensure your
embeddingsobject can provide embeddings for the data. - Custom Logic: Implement
calculate_similarityor any other logic needed to evaluate the embeddings. - Dataset Structure: Ensure your dataset is structured to provide the necessary data for embedding extraction.
This approach allows you to bypass text processing and directly work with embeddings in your evaluation pipeline [1].
To continue talking to Dosu, mention @dosu.
from ragas import EvaluationDataset
no my dataset is not a list or str, pandas is not
@dosu. how to fix this error: Exception has occurred: AttributeError 'list' object has no attribute 'rename_columns' File "C:\Users\Amin\git\ragas-prototype\src\rag_eva.py", line 196, in evaluation_report = evaluate(ragas_data, metrics=metrics, embeddings=ragas_embeddings,llm=custom_llm) # Pass your custom LLM here ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'list' object has no attribute 'rename_columns'
I have a similar issue when using Pandas DataFrame.
Error:
Traceback (most recent call last): File "/Users/tobiasmolland/GitHub/Larry-Llama/scripts/ragas_generate_testset.py", line 49, in result = evaluate( ^^^^^^^^^ File "/Users/tobiasmolland/GitHub/Larry-Llama/venv/lib/python3.12/site-packages/ragas/_analytics.py", line 129, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/tobiasmolland/GitHub/Larry-Llama/venv/lib/python3.12/site-packages/ragas/evaluation.py", line 161, in evaluate dataset = remap_column_names(dataset, column_map) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/tobiasmolland/GitHub/Larry-Llama/venv/lib/python3.12/site-packages/ragas/validation.py", line 19, in remap_column_names return dataset.rename_columns(inverse_column_map) ^^^^^^^^^^^^^^^^^^^^^^ File "/Users/tobiasmolland/GitHub/Larry-Llama/venv/lib/python3.12/site-packages/pandas/core/generic.py", line 6299, in getattr return object.getattribute(self, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'DataFrame' object has no attribute 'rename_columns'
Code used:
# import dependencies from ragas.testset.generator import TestsetGenerator from ragas.testset.evolutions import simple, reasoning, multi_context from langchain_openai import ChatOpenAI, OpenAIEmbeddings # set models generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k") critic_llm = ChatOpenAI(model="gpt-4") embeddings = OpenAIEmbeddings() # create generator with models generator = TestsetGenerator.from_langchain( generator_llm, critic_llm, embeddings ) # generate testset testset = generator.generate_with_langchain_docs(documents, test_size=5, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}) df = testset.to_pandas() # evaluate testset from ragas.metrics import ( answer_relevancy, faithfulness, context_recall, context_precision, ) from ragas import evaluate result = evaluate( df, metrics=[ context_precision, faithfulness, answer_relevancy, context_recall, ], ) resultAny solution to this yet?
use colum_map method,
To resolve the AttributeError: 'DataFrame' object has no attribute 'rename_columns', you should use the remap_column_names function from the Ragas library. This function is designed to handle column renaming for datasets used in the evaluate function.
Here's how you can apply it:
-
Use
remap_column_names:Ensure you have a
column_mapdictionary that maps your current DataFrame column names to the expected ones. -
Apply the Function:
from ragas.validation import remap_column_names # Define your column map column_map = { "current_question_column": "question", "current_answer_column": "answer", "current_contexts_column": "contexts" } # Remap the DataFrame columns df = remap_column_names(df, column_map) # Now use the DataFrame in the evaluate function result = evaluate( df, metrics=[ context_precision, faithfulness, answer_relevancy, context_recall, ], )
Ensure your column_map correctly reflects the current and expected column names [1].
To continue talking to Dosu, mention @dosu.
Hi! Something similar happens to my evaluation dataset, which is a dataframe with columns: ['user_input', 'reference_contexts', 'reference', 'synthesizer_name'], created with generator.generate_with_langchain_docs. The error is:
AttributeError: 'DataFrame' object has no attribute 'get_sample_type'.
My code:
from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness, SemanticSimilarity
from ragas import evaluate
generator_embeddings = LangchainEmbeddingsWrapper(embedder)
generator_llm = LangchainLLMWrapper(llm)
metrics = [
LLMContextRecall(llm=generator_llm),
FactualCorrectness(llm=generator_llm),
Faithfulness(llm=generator_llm),
SemanticSimilarity(embeddings=generator_embeddings)
]
results = evaluate(dataset=df_test, metrics=metrics)