Ragas evaluation
Faithfulness is not callable
To resolve the issue where "Faithfulness is not callable" in the Ragas evaluation, you need to ensure that you are instantiating the Faithfulness class correctly. The error suggests that you might be trying to call an instance of Faithfulness as if it were a function.
Here is the correct way to instantiate and use the Faithfulness class:
faithfulness = Faithfulness()
Make sure you are not trying to call faithfulness as a function. For example, if you have something like this:
score = faithfulness()
You should change it to:
# Assuming you have a sample and callbacks defined
score = await faithfulness._single_turn_ascore(sample, callbacks)
Ensure that you are using the methods provided by the Faithfulness class correctly [1].
To continue talking to Dosu, mention @dosu.
Ok got it
why is Empty DataFrame is returned? llm, embed_model = init_llm_ollama(base_url, bearer_token) def evaluate( dataset: List[LLMTestCase], metrics: List[Callable], llm: llm, embeddings: embed_model, raise_exceptions: bool = True, ) -> pd.DataFrame: results = []
# Convert results to DataFrame
results_df = pd.DataFrame(results)
return results_df
Sample data
data = { 'question': ['When was the first super bowl?', 'Who won the most super bowls?'], 'answer': ['The first superbowl was held on Jan 15, 1967', 'The most super bowls have been won by The New England Patriots'], 'contexts' : [['The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles,'], ['The Green Bay Packers...Green Bay, Wisconsin.','The Packers compete...Football Conference']], 'ground_truth': ['The first superbowl was held on January 15, 1967', 'The New England Patriots have won the Super Bowl a record six times'] } data_samples = Dataset.from_dict(data)
Example usage
test_cases = [ LLMTestCase( question=question, answer=answer, contexts=context, ground_truth=ground_truth ) for question, answer, ground_truth, context in zip( data_samples['question'], data_samples['answer'], data_samples['ground_truth'], data_samples['contexts'] ) ]
List of metrics functions
metrics = [ answer_relevancy, # context_precision, # faithfulness, # context_recall ]
Evaluate
async def main(): results_df = evaluate( dataset=test_cases, metrics=[Faithfulness], llm=llm, embeddings=embed_model, raise_exceptions=False )
print(results_df)
results_df.to_csv(r'C:\TestingBot\Scripts\V1\myfile.csv', sep=',')
Run the async main function
asyncio.run(main())
An Empty DataFrame is returned when evaluating the dataset using the evaluate function because the dataset specified by dataset_name does not exist in LangSmith. The function raises a ValueError if the dataset is not found, which indicates that the dataset must be present in LangSmith for the evaluation to proceed [1].
Additionally, the evaluate function in the Ragas framework requires the dataset to be in a specific format with columns such as question, contexts, answer, and ground_truth. If the dataset does not conform to this format or if the required columns are missing, the function will raise a ValueError, leading to an empty or non-existent result [2].
Ensure that your dataset is correctly formatted and exists in LangSmith. Also, make sure you have set the necessary environment variables, such as OPENAI_API_KEY, if you are using metrics that require access to external APIs like OpenAI.
To continue talking to Dosu, mention @dosu.
Do ragas support llama3.1:8b model, getting NAN output
Common reasons for getting NaN outputs in the Ragas evaluation framework include:
- Exceptions during metric evaluation: If any of the metrics fail during evaluation and
raise_exceptionsis set toFalse, the evaluation will returnnp.nanfor the row that failed. - Validation failures: If the required columns for the metrics are missing or if the columns are of the wrong format, it can lead to NaN outputs.
- Unsupported sample type: If the sample type is unsupported, it can cause issues in the evaluation process, potentially leading to NaN outputs.
- Initialization issues: If the language model (LLM) or embeddings are not properly initialized, it can result in NaN outputs for metrics that depend on them [1].
Regarding support for the llama3.1:8b model, I couldn't find specific information in the repository. However, you can refer to the example code for integrating LlamaIndex with Ragas, which might help you set up and evaluate your model correctly [2].
To continue talking to Dosu, mention @dosu.
Python version - 3.11 Ragas- 0.1.16
share me the correct evaluation metrics
The correct evaluation metrics to use with Ragas version 0.1.16 in Python 3.11 are:
context_precisioncontext_recallfaithfulnessanswer_relevancycontext_utilizationcontext_entity_recallnoise_sensitivitysummarization_scoreanswer_semantic_similarityanswer_correctness
You can import and use these metrics as follows:
from ragas.metrics import (
answer_relevancy,
faithfulness,
context_recall,
context_precision,
context_utilization,
context_entity_recall,
noise_sensitivity,
summarization_score,
answer_semantic_similarity,
answer_correctness,
)
These metrics measure various aspects of your Retrieval Augmented Generation (RAG) systems:
- Faithfulness - Measures the factual consistency of the answer to the context based on the question.
- Context_precision - Measures how relevant the retrieved context is to the question, conveying the quality of the retrieval pipeline.
- Answer_relevancy - Measures how relevant the answer is to the question.
- Context_recall - Measures the retriever's ability to retrieve all necessary information required to answer the question.
- Context_utilization - Evaluates how effectively the retrieved context is used in generating the answer.
- Context_entity_recall - Measures the recall of entities in the retrieved context.
- Noise_sensitivity - Assesses the robustness of the system to noisy inputs.
- Summarization_score - Evaluates the quality of summaries generated by the system.
- Answer_semantic_similarity - Measures the semantic similarity between the generated answer and the expected answer.
- Answer_correctness - Evaluates the correctness of the generated answer.
To run the evaluation on your dataset with these metrics, you can use the evaluate function provided by Ragas:
from ragas import evaluate
from datasets import Dataset
# Example dataset
dataset = Dataset.from_dict({
'question': ["What is the capital of France?"],
'contexts': [["Paris is the capital of France."]],
'answer': ["Paris"],
'ground_truth': [["Paris"]]
})
# Run evaluation
result = evaluate(dataset)
print(result)
This will give you a comprehensive view of your RAG system's performance based on the specified metrics [1][2][3].
To continue talking to Dosu, mention @dosu.
error while evaluating verify that the data type in your code aligns with the expected pyarrow data types.
To ensure that your data types align with the expected pyarrow data types in the Ragas evaluation framework, verify that your dataset columns have the following types:
question:list<string>contexts:list<list<string>>answer:list<string>ground_truth:list<list<string>>
Here's an example of how to prepare your dataset with the correct data types:
from datasets import Dataset
# Example dataset
data = {
'question': ['When was the first super bowl?', 'Who won the most super bowls?'],
'answer': ['The first superbowl was held on Jan 15, 1967', 'The most super bowls have been won by The New England Patriots'],
'contexts': [['The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles,'],
['The Green Bay Packers...Green Bay, Wisconsin.', 'The Packers compete...Football Conference']],
'ground_truth': [['The first superbowl was held on January 15, 1967'], ['The New England Patriots have won the Super Bowl a record six times']]
}
# Convert to Hugging Face Dataset
dataset = Dataset.from_dict(data)
# Verify the data types
print(dataset)
# Example usage with Ragas evaluation
from ragas import evaluate
from ragas.metrics import (
answer_relevancy,
faithfulness,
context_recall,
context_precision,
context_utilization,
context_entity_recall,
noise_sensitivity,
summarization_score,
answer_semantic_similarity,
answer_correctness,
)
# List of metrics
metrics = [
answer_relevancy,
faithfulness,
context_recall,
context_precision,
context_utilization,
context_entity_recall,
noise_sensitivity,
summarization_score,
answer_semantic_similarity,
answer_correctness,
]
# Run evaluation
result = evaluate(dataset, metrics=metrics)
print(result)
Ensure that your dataset conforms to these data types to avoid errors during evaluation [1][2][3].
To continue talking to Dosu, mention @dosu.
'dict' object has no attribute 'rename_columns'
You are getting the error "'dict' object has no attribute 'rename_columns'" because the rename_columns method is being called on a dictionary object instead of a Dataset object. The rename_columns method is a method of the Dataset class from the datasets library, not a dictionary.
In the provided code, the rename_columns method is used in the convert_v1_to_v2_dataset and convert_v2_to_v1_dataset functions, which expect a Dataset object as their input. If you pass a dictionary to these functions, you will encounter this error.
Here is the relevant code snippet:
def convert_v1_to_v2_dataset(dataset: Dataset) -> Dataset:
columns_map = {v: k for k, v in REQUIRED_COLS_v1.items() if v in dataset.features}
return dataset.rename_columns(columns_map)
def convert_v2_to_v1_dataset(dataset: Dataset) -> Dataset:
columns_map = {k: v for k, v in REQUIRED_COLS_v1.items() if k in dataset.features}
return dataset.rename_columns(columns_map)
Additionally, you can refer to the remap_column_names function from the ragas library, which also demonstrates how to remap column names in a Dataset object:
def remap_column_names(dataset: Dataset, column_map: dict[str, str]) -> Dataset:
"""
Remap the column names in case dataset uses different column names
"""
inverse_column_map = {v: k for k, v in column_map.items()}
return dataset.rename_columns(inverse_column_map)
Make sure that you are passing a Dataset object to these functions, not a dictionary [1][2].
To continue talking to Dosu, mention @dosu.
AttributeError: 'LLMTestCase' object has no attribute 'items' data = LLMTestCase( question=query, answer=actual_output, ground_truth="Vector databases store and provide access to unstructured data", contexts=retrieval_context, )
The AttributeError: 'LLMTestCase' object has no attribute 'items' suggests that somewhere in your code, an LLMTestCase object is being treated as a dictionary. This typically happens when a method or function expects a dictionary but receives an LLMTestCase object instead.
To resolve this, ensure that you are correctly using the LLMTestCase class and not mistakenly treating it as a dictionary. Here is an example of how to correctly instantiate and use an LLMTestCase object:
# Assuming LLMTestCase is defined correctly
data = LLMTestCase(
question=query,
answer=actual_output,
ground_truth="Vector databases store and provide access to unstructured data",
contexts=retrieval_context,
)
# Example usage
print(data.question) # Accessing attributes of LLMTestCase
print(data.answer)
print(data.ground_truth)
print(data.contexts)
If you need to convert an LLMTestCase object to a dictionary for some reason, you can manually create the dictionary:
data_dict = {
'question': data.question,
'answer': data.answer,
'ground_truth': data.ground_truth,
'contexts': data.contexts,
}
# Now you can use data_dict as a dictionary
Ensure that any function or method expecting a dictionary receives data_dict instead of the LLMTestCase object directly. This should resolve the AttributeError.
To continue talking to Dosu, mention @dosu.
ValueError: Dataset feature "contexts" should be of type Sequence[string], got <class 'datasets.features.features.Sequence'> data = { 'question': [query], # List with a single question 'answer': [actual_output], # List with a single answer 'contexts': [retrieval_context], # List with a single list of contexts (retrieval_context should be a list of strings) 'ground_truth': [ "Vector databases store and provide access to unstructured data, such as text or images, in the form of their vector embeddings. " "Because similar objects are close together in vector space, the similarity of data objects can be calculated based on the distance between the data object’s vector embeddings." ] # List with a single ground truth }
To resolve the ValueError: "Dataset feature 'contexts' should be of type Sequence[string], got <class 'datasets.features.features.Sequence>'", you need to ensure that the contexts field is correctly specified as a sequence of strings. Here is how you can adjust your dataset creation to meet the expected format:
- Ensure that
retrieval_contextis a list of strings. - Explicitly define the feature types when creating the dataset.
Here is the corrected code snippet:
from datasets import Dataset, Features, Sequence, Value
# Ensure retrieval_context is a list of strings
retrieval_context = [
"Context 1",
"Context 2",
# Add more contexts as needed
]
# Define your data
data = {
'question': [query], # List with a single question
'answer': [actual_output], # List with a single answer
'contexts': [retrieval_context], # List with a single list of contexts
'ground_truth': [
"Vector databases store and provide access to unstructured data, such as text or images, in the form of their vector embeddings. "
"Because similar objects are close together in vector space, the similarity of data objects can be calculated based on the distance between the data object’s vector embeddings."
] # List with a single ground truth
}
# Define the features explicitly
features = Features({
'question': Sequence(Value('string')),
'answer': Sequence(Value('string')),
'contexts': Sequence(Sequence(Value('string'))),
'ground_truth': Sequence(Value('string'))
})
# Create the dataset
dataset = Dataset.from_dict(data, features=features)
# Verify the dataset
print(dataset)
This code ensures that the contexts field is recognized as a sequence of strings, which should resolve the ValueError [1][2][3].
To continue talking to Dosu, mention @dosu.
KeyError('summary') Exception raised in Job[2]: TypeError(CustomOllama.agenerate_text() got an unexpected keyword argument 'n')
You are encountering a KeyError('summary') and a TypeError stating that CustomOllama.agenerate_text() got an unexpected keyword argument 'n'.
-
KeyError('summary'):
-
TypeError(CustomOllama.agenerate_text() got an unexpected keyword argument 'n'):
- The
CustomOllamaclass'sagenerate_textmethod does not support thenargument. Update the method signature to include thenargument:
- The
async def agenerate_text(
self,
prompt: PromptValue,
n: int = 1,
temperature: t.Optional[float] = None,
stop: t.Optional[t.List[str]] = None,
callbacks: t.List = [],
) -> LLMResult:
# Your implementation here
This should resolve the issues you are facing [3][4].
To continue talking to Dosu, mention @dosu.
AttributeError('CustomOllama' object has no attribute 'model_name')
The AttributeError: 'CustomOllama' object has no attribute 'model_name' indicates that the CustomOllama class is missing the model_name attribute. To resolve this, you need to ensure that the CustomOllama class includes the model_name attribute.
Here is an example of how you might define the CustomOllama class with the model_name attribute:
class CustomOllama:
def __init__(self, model_name: str, other_params: dict):
self.model_name = model_name
# Initialize other attributes as needed
for key, value in other_params.items():
setattr(self, key, value)
async def agenerate_text(
self,
prompt: str,
n: int = 1,
temperature: float = 1e-8,
stop: t.Optional[t.List[str]] = None,
callbacks: t.List = [],
) -> LLMResult:
# Your implementation here
pass
# Add other methods as needed
Ensure that when you instantiate the CustomOllama class, you provide the model_name:
ollama = CustomOllama(model_name="llama3.1:8b", other_params={})
This should resolve the AttributeError by ensuring that the model_name attribute is properly defined and accessible in the CustomOllama class.
To continue talking to Dosu, mention @dosu.
Object of type PromptValue is not JSON serializable ragas evaluation
To resolve the "Object of type PromptValue is not JSON serializable" error in Ragas evaluation, you need to ensure that the PromptValue object is converted to a JSON-serializable format before attempting to serialize it. This can be done by implementing a custom to_dict method in the PromptValue class and using it during serialization.
Here is an example of how you can modify the PromptValue class to include a to_dict method:
class PromptValue(BasePromptValue):
prompt_str: str
def to_messages(self) -> t.List[BaseMessage]:
"""Return prompt as a list of Messages."""
return [HumanMessage(content=self.to_string())]
def to_string(self) -> str:
return self.prompt_str
def to_dict(self) -> t.Dict[str, t.Any]:
"""Convert the PromptValue to a dictionary."""
return {"prompt_str": self.prompt_str}
Then, when you need to serialize the PromptValue object, you can use the to_dict method:
import json
prompt_value = PromptValue(prompt_str="Your prompt string here")
json_data = json.dumps(prompt_value.to_dict())
This ensures that the PromptValue object is converted to a JSON-serializable dictionary before serialization [1].
To continue talking to Dosu, mention @dosu.
Mean answer_correctness Score N/A
The mean answer_correctness score is showing as N/A because the function _ascore returns np.nan when the answers variable is None. This happens if the _output_parser.aparse method fails to parse the result text.
answers = await _output_parser.aparse(
result_text, p_value, self.llm, self.max_retries
)
if answers is None:
return np.nan
Ensure that the result text conforms to the expected format and schema to avoid parsing failures [1].
To continue talking to Dosu, mention @dosu.
Error in str.on_chain_start callback: AttributeError("'str' object has no attribute 'ignore_chain'") AttributeError in AnswerSimilarity callback: 'str' object has no attribute 'raise_error'
The errors you're encountering indicate that a string object is being used where a callback object is expected. Specifically, the str object does not have the attributes ignore_chain and raise_error, which are expected in the callback objects.
Resolving the Errors
-
Error in
str.on_chain_startcallback:AttributeError("'str' object has no attribute 'ignore_chain'"):- Ensure that the
on_chain_startmethod is being called on a valid callback object, not a string. The callback object should have theignore_chainattribute.
- Ensure that the
-
AttributeError in
AnswerSimilaritycallback:'str' object has no attribute 'raise_error':- Ensure that the
AnswerSimilaritycallback is correctly instantiated and used. The callback object should have theraise_errorattribute.
- Ensure that the
Correct Usage of Callbacks
Here is an example of how to correctly instantiate and use callback objects in the Ragas framework:
from ragas.callbacks import CallbackManagerForChainRun, CallbackManagerForChainGroup
# Example callback manager instantiation
callback_manager = CallbackManagerForChainRun(
handlers=[],
inheritable_handlers=[],
parent_run_id=None,
tags=[],
inheritable_tags=[],
metadata={},
inheritable_metadata={}
)
# Ensure you are using the correct callback objects
callbacks = [callback_manager]
# Example usage in evaluation
result = await evaluate(
dataset=test_cases,
metrics=[Faithfulness],
llm=llm,
embeddings=embed_model,
raise_exceptions=False,
callbacks=callbacks
)
Attributes of Callback Objects
Ensure that your callback objects have the necessary attributes:
ignore_chainraise_error
These attributes are part of the expected structure of callback objects in the Ragas framework [1][2].
Example of a Custom Callback
If you need to create a custom callback, ensure it includes the required attributes:
class CustomCallback:
def __init__(self):
self.ignore_chain = False
self.raise_error = True
def on_chain_start(self, *args, **kwargs):
# Your implementation here
pass
def on_chain_end(self, *args, **kwargs):
# Your implementation here
pass
def on_chain_error(self, *args, **kwargs):
# Your implementation here
pass
By ensuring that your callback objects are correctly instantiated and include the necessary attributes, you should be able to resolve the AttributeError issues.
To continue talking to Dosu, mention @dosu.
Looks like syntax issue. With current state, try:
from ragas.metrics import Faithfulness and then faithfulness_metric = Faithfulness()
Or check out the new llm_factory.
It seems the issue was answered, closing this now.