ragas icon indicating copy to clipboard operation
ragas copied to clipboard

AttributeError

Open amin-kh96 opened this issue 1 year ago • 8 comments

i deifned this function to not use the LLM calls but it still give me this error:

Exception has occurred: AttributeError 'list' object has no attribute 'get_sample_type' File "C:\Users\Amin\OneDrive - unige.it\Documenti\projectss\ragas-prototype\src\rag_eva.py", line 222, in evaluation_report = evaluate(ragas_data, metrics=metrics, embeddings=ragas_embeddings,llm=custom_llm) # Pass your custom LLM here ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'list' object has no attribute 'get_sample_type'

my code is this:

Define the custom LLM class

class CustomRagasLLM(BaseRagasLLM): def init(self, api_key: str = None): """ Initialize the custom LLM, optionally using an API key if necessary. """ self.api_key = api_key

def _call(self, prompt: str) -> str:
    """
    Process the prompt and return a result. This can be customized to
    use a local model or perform any required logic.
    """
    if not self.api_key:
        return f"Processed: {prompt} (without API key)"
    else:
        # Handle LLM response if using an API
        return f"Processed: {prompt} (with API key: {self.api_key})"

def generate_text(
    self,
    prompt: PromptValue,
    n: int = 1,
    temperature: float = 1e-8,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.List = []
) -> LLMResult:
    # Synchronous generation logic
    text = self._call(prompt)
    return LLMResult(generations=[Generation(text=text)])

async def agenerate_text(
    self,
    prompt: PromptValue,
    n: int = 1,
    temperature: float = 1e-8,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.List = []
) -> LLMResult:
    """
    Asynchronous method to generate text. This should allow for async processing.
    """
    # Simulate an asynchronous call, here we directly call the sync method for now
    text =await self._call(prompt)
    return LLMResult(generations=[Generation(text=text)])

amin-kh96 avatar Oct 08 '24 12:10 amin-kh96

To fix the AttributeError: 'list' object has no attribute 'get_sample_type', ensure that you are using an EvaluationDataset instance instead of a list when calling the evaluate function. Here's how you can create an EvaluationDataset correctly:

  1. Convert your list to an EvaluationDataset:

    If you have a list of samples, you need to convert it to an EvaluationDataset:

    from ragas import EvaluationDataset
    
    # Assuming ragas_data is your list of samples
    evaluation_dataset = EvaluationDataset.from_list(ragas_data)
    
    # Use evaluation_dataset in the evaluate function
    evaluation_report = evaluate(evaluation_dataset, metrics=metrics, embeddings=ragas_embeddings, llm=custom_llm)
    
  2. Ensure your samples are structured correctly:

    Make sure each sample in your list is an instance of SingleTurnSample or MultiTurnSample, as required by the EvaluationDataset [1][2].

By following these steps, you should be able to resolve the error and proceed with your evaluation.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 08 '24 12:10 dosubot[bot]

@dosu. how to fix this error: Exception has occurred: AttributeError 'list' object has no attribute 'rename_columns' File "C:\Users\Amin\git\ragas-prototype\src\rag_eva.py", line 196, in evaluation_report = evaluate(ragas_data, metrics=metrics, embeddings=ragas_embeddings,llm=custom_llm) # Pass your custom LLM here ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'list' object has no attribute 'rename_columns'

amin-kh96 avatar Oct 08 '24 15:10 amin-kh96

To fix the AttributeError: 'list' object has no attribute 'rename_columns', you need to convert your list to a compatible data structure that supports the rename_columns method. Here's how you can do it:

  1. Convert your list to an EvaluationDataset:

    If you have a list of samples, convert it to an EvaluationDataset:

    from ragas import EvaluationDataset
    
    # Assuming ragas_data is your list of samples
    evaluation_dataset = EvaluationDataset.from_list(ragas_data)
    
    # Use evaluation_dataset in the evaluate function
    evaluation_report = evaluate(evaluation_dataset, metrics=metrics, embeddings=ragas_embeddings, llm=custom_llm)
    

Ensure that each sample in your list is structured correctly as required by the EvaluationDataset [1][2].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 08 '24 15:10 dosubot[bot]

@dosu. but this does not exist in the Ragas library: EvaluationDataset. which version of ragas are you using?

amin-kh96 avatar Oct 08 '24 15:10 amin-kh96

The EvaluationDataset class might not exist in the current version of the Ragas library. Instead, you can use the available methods to handle datasets or lists of samples. Here are some relevant methods:

  • from_list(): Creates a dataset from a list of dictionaries.
  • to_pandas(): Converts samples to a Pandas DataFrame.
  • to_hf_dataset(): Converts samples to a Hugging Face dataset.

Ensure your samples are instances of SingleTurnSample or MultiTurnSample for compatibility [1][2].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 08 '24 15:10 dosubot[bot]

@dosu. i face this error : AttributeError: 'list' object has no attribute 'rename_columns' and my code is this:

Assuming data_set is a list of dictionaries

ragas_data = [ { "question": entry["text_vector_1"], # Assuming this is a list of strings "answer": entry["text_vector_2"], # Assuming this is a list of strings "contexts": entry["chunks"] # Assuming this is a list of lists of strings } for entry in data_set ]

Create the required structure for Dataset

formatted_data = { "question": [entry["question"] for entry in ragas_data], "contexts": [entry["contexts"] for entry in ragas_data], "answer": [entry["answer"] for entry in ragas_data] }

the important thing is that I have a list of floats(computed embeddings).

amin-kh96 avatar Oct 08 '24 15:10 amin-kh96

To resolve the AttributeError: 'list' object has no attribute 'rename_columns', you need to convert your list of dictionaries into a compatible dataset format that supports the rename_columns method. Here's how you can do it using the from_list method of the EvaluationDataset class:

  1. Convert your list to an EvaluationDataset:

    First, ensure your data is structured correctly and then convert it:

    from ragas import EvaluationDataset
    
    # Assuming ragas_data is your list of dictionaries
    evaluation_dataset = EvaluationDataset.from_list(ragas_data)
    
    # Now you can use evaluation_dataset in the evaluate function
    evaluation_report = evaluate(evaluation_dataset, metrics=metrics, embeddings=ragas_embeddings, llm=custom_llm)
    
  2. Ensure your data structure is correct:

    Make sure each dictionary in your list has the correct keys and values as required by the EvaluationDataset:

    ragas_data = [
        {
            "question": entry["text_vector_1"],
            "answer": entry["text_vector_2"],
            "contexts": entry["chunks"]
        }
        for entry in data_set
    ]
    

This should help you avoid the AttributeError by using a compatible dataset structure [1].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 08 '24 15:10 dosubot[bot]

@dosu. how to fix this error: Exception has occurred: AttributeError 'list' object has no attribute 'rename_columns' File "C:\Users\Amin\git\ragas-prototype\src\rag_eva.py", line 196, in evaluation_report = evaluate(ragas_data, metrics=metrics, embeddings=ragas_embeddings,llm=custom_llm) # Pass your custom LLM here ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'list' object has no attribute 'rename_columns'

I have a similar issue when using Pandas DataFrame.

Error:

Traceback (most recent call last): File "/Users/tobiasmolland/GitHub/Larry-Llama/scripts/ragas_generate_testset.py", line 49, in result = evaluate( ^^^^^^^^^ File "/Users/tobiasmolland/GitHub/Larry-Llama/venv/lib/python3.12/site-packages/ragas/_analytics.py", line 129, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/tobiasmolland/GitHub/Larry-Llama/venv/lib/python3.12/site-packages/ragas/evaluation.py", line 161, in evaluate dataset = remap_column_names(dataset, column_map) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/tobiasmolland/GitHub/Larry-Llama/venv/lib/python3.12/site-packages/ragas/validation.py", line 19, in remap_column_names return dataset.rename_columns(inverse_column_map) ^^^^^^^^^^^^^^^^^^^^^^ File "/Users/tobiasmolland/GitHub/Larry-Llama/venv/lib/python3.12/site-packages/pandas/core/generic.py", line 6299, in getattr return object.getattribute(self, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'DataFrame' object has no attribute 'rename_columns'

Code used:

# import dependencies
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# set models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

# create generator with models
generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

# generate testset
testset = generator.generate_with_langchain_docs(documents, test_size=5, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})
df = testset.to_pandas()

# evaluate testset
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
)

from ragas import evaluate

result = evaluate(
    df,
    metrics=[
        context_precision,
        faithfulness,
        answer_relevancy,
        context_recall,
    ],
)
result

Any solution to this yet?

ToBeAss avatar Oct 11 '24 10:10 ToBeAss

no, I changed the format multiple times but it still gives me the error. I will share this with you if I find any solution.

amin-kh96 avatar Oct 14 '24 07:10 amin-kh96

To resolve the AttributeError: 'DataFrame' object has no attribute 'rename_columns', you need to convert your Pandas DataFrame to a format that supports the rename_columns method. Here's how you can do it:

  1. Convert the DataFrame to a compatible dataset:

    Use the from_pandas method if available, or convert it to a list of dictionaries and then to a compatible dataset format:

    from ragas import EvaluationDataset
    
    # Convert DataFrame to a list of dictionaries
    ragas_data = df.to_dict(orient='records')
    
    # Create an EvaluationDataset
    evaluation_dataset = EvaluationDataset.from_list(ragas_data)
    
    # Use evaluation_dataset in the evaluate function
    result = evaluate(
        evaluation_dataset,
        metrics=[
            context_precision,
            faithfulness,
            answer_relevancy,
            context_recall,
        ],
    )
    
  2. Ensure your data structure is correct:

    Make sure each dictionary in your list has the correct keys and values as required by the EvaluationDataset.

This approach should help you avoid the AttributeError by using a compatible dataset structure. If EvaluationDataset is not available, ensure you are using the correct version of the library or check for any updates that might include this functionality.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 14 '24 07:10 dosubot[bot]

@dosu. the EvaluationDataset does not exist in my ragas library.

amin-kh96 avatar Oct 14 '24 07:10 amin-kh96

If the EvaluationDataset class is not available in your version of the Ragas library, you can handle datasets using alternative methods. Here’s how you can proceed:

  1. Convert Your Data:

    • If you have a list of dictionaries, you can convert it to a format compatible with the evaluation functions. Use methods like to_hf_dataset() if available.
  2. Ensure Correct Structure:

    • Make sure your data is structured correctly with keys and values that match the expected input for evaluation functions.
  3. Use Available Methods:

    • Check for methods like from_list() or to_pandas() to convert your data into a compatible format.

If these methods are not available, consider checking the library documentation or updating to a version that includes these features.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 14 '24 07:10 dosubot[bot]

@dosu. this part of my code is related to my data set. take a look at it and tell me which part of it requires changes.

Assuming data_set is a list of dictionaries

ragas_data = [ { "question": entry["text_vector_1"], # Assuming this is a list of strings "answer": entry["text_vector_2"], # Assuming this is a list of strings "contexts": entry["chunks"] # Assuming this is a list of lists of strings } for entry in data_set ]

Create the required structure for Dataset

formatted_data = { "question": [entry["question"] for entry in ragas_data], "contexts": [entry["contexts"] for entry in ragas_data], "answer": [entry["answer"] for entry in ragas_data] }

Create a Dataset using the Hugging Face datasets library

ragas_dataset = Dataset.from_dict(formatted_data)

amin-kh96 avatar Oct 14 '24 07:10 amin-kh96

To ensure your dataset is compatible with the Ragas evaluation framework, you need to make sure it matches the expected input structure. Here's how you can adjust your code:

  1. Ensure Correct Structure:

    • The question, answer, and contexts fields should be structured as follows:
      • question: A string.
      • answer: A string.
      • contexts: A list of strings.
  2. Convert to Compatible Format:

    • If you're using the Hugging Face datasets library, ensure the dataset is structured correctly before passing it to the evaluate function.

Here's a revised version of your code:

from datasets import Dataset

# Assuming data_set is a list of dictionaries
ragas_data = [
    {
        "question": entry["text_vector_1"],  # Should be a string
        "answer": entry["text_vector_2"],    # Should be a string
        "contexts": entry["chunks"]          # Should be a list of strings
    }
    for entry in data_set
]

# Create the required structure for Dataset
formatted_data = {
    "question": [entry["question"] for entry in ragas_data],
    "contexts": [entry["contexts"] for entry in ragas_data],
    "answer": [entry["answer"] for entry in ragas_data]
}

# Create a Dataset using the Hugging Face `datasets` library
ragas_dataset = Dataset.from_dict(formatted_data)

# Ensure the dataset is compatible with the evaluate function
# You may need to convert it to a format that the evaluate function expects

Make sure each entry in ragas_data has the correct types for question, answer, and contexts. If the evaluate function requires a specific dataset type, you might need to convert it accordingly.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 14 '24 07:10 dosubot[bot]

@dosu. i see this error: Exception has occurred: ValueError Dataset feature "question" should be of type string File "C:\Users\Amin\git\ragas-prototype\src\rag_eva.py", line 199, in evaluation_report = evaluate(ragas_dataset, metrics=metrics, embeddings=ragas_embeddings,llm=custom_llm) # Pass your custom LLM here ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: Dataset feature "question" should be of type string

the fromat is like this: formatted_data = { "question": [entry["question"] for entry in ragas_data], "contexts": [entry["contexts"] for entry in ragas_data], "answer": [entry["answer"] for entry in ragas_data] }

Create a Dataset using the Hugging Face datasets library

ragas_dataset = Dataset.from_dict(formatted_data)

amin-kh96 avatar Oct 14 '24 10:10 amin-kh96

Ensure that all elements in the question list are strings. Here's how you can check and fix this:

  1. Verify Data Types: Make sure each entry["question"] is a string. If it's not, convert it to a string.

  2. Update Your Code: Modify the list comprehension to ensure all questions are strings:

    formatted_data = {
        "question": [str(entry["question"]) for entry in ragas_data],
        "contexts": [entry["contexts"] for entry in ragas_data],
        "answer": [entry["answer"] for entry in ragas_data]
    }
    
  3. Check the Source Data: Ensure that the source data (data_set) has the correct types for each field.

By ensuring all question entries are strings, you should resolve the ValueError [1].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 14 '24 10:10 dosubot[bot]

@dosu. but i have floats not the str, i have the embeddings for question and answer and also the chunks

amin-kh96 avatar Oct 14 '24 10:10 amin-kh96

To handle embeddings (floats) for the question and answer fields in the Ragas evaluation framework, you need to ensure compatibility by using the column_map parameter in the evaluate function. Here's how you can do it:

  1. Use the column_map Parameter: Map your embedding columns to the expected names.

  2. Set Up Your Code: Ensure you have an appropriate embedding handler.

Here's an example:

from ragas import evaluate
from ragas.embeddings.base import BaseRagasEmbeddings

# Assuming `my_embeddings` is an instance of BaseRagasEmbeddings or a compatible class
# and `ragas_dataset` is your dataset with embeddings

column_map = {
    "question": "question_embeddings",
    "answer": "answer_embeddings"
}

result = evaluate(
    dataset=ragas_dataset,
    embeddings=my_embeddings,
    column_map=column_map
)

print(result)

Make sure your dataset columns are correctly mapped and that you have the necessary embedding handler set up [1].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 14 '24 10:10 dosubot[bot]

@dosu how to adapt or create a custom evaluation logic that directly supports embeddings.

amin-kh96 avatar Oct 14 '24 12:10 amin-kh96

To create a custom evaluation logic that directly supports embeddings in the Ragas framework, you can use the evaluate function with custom embeddings. Here's how:

  1. Wrap Your Embeddings: Use LangchainEmbeddingsWrapper to wrap your custom embeddings.

  2. Pass to Evaluate Function: Pass the wrapped embeddings to the evaluate function.

Here's a code snippet:

from ragas.embeddings.base import LangchainEmbeddingsWrapper
from ragas.evaluation import evaluate
from datasets import Dataset

# Assuming `my_embeddings` is your custom embeddings object
wrapped_embeddings = LangchainEmbeddingsWrapper(my_embeddings)

# Create your dataset
dataset = Dataset.from_dict({
    "question": ["What is AI?", "Explain machine learning."],
    "ground_truth": ["Artificial Intelligence is...", "Machine learning is a subset of AI..."]
})

# Define your metrics
metrics = [...]  # List of Metric objects

# Call the evaluate function
results = evaluate(
    dataset=dataset,
    metrics=metrics,
    embeddings=wrapped_embeddings
)

print(results)

This setup allows you to integrate custom embeddings into the evaluation process directly [1].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 14 '24 12:10 dosubot[bot]

@dosu but i wrote these lines of codes, what is the point of it? what should I do with them now? class CustomHuggingFaceRagasEmbeddings(BaseRagasEmbeddings): def init(self, model_name: str, custom_embeddings: list = None): """ Initialize the Custom Hugging Face Ragas Embeddings with the specified model and custom embeddings.

    Parameters:
        model_name (str): The name of the Hugging Face model to use (e.g., 'distilbert-base-uncased').
        custom_embeddings (list): A list of pre-computed custom embeddings (optional).
    """
    model_name = 'distilbert-base-uncased'
    self.tokenizer = AutoTokenizer.from_pretrained(model_name)
    self.model = AutoModel.from_pretrained(model_name)
    self.custom_embeddings = custom_embeddings  # Store the custom embeddings

def embed_documents(self, texts: list) -> np.ndarray:
    """
    Generate embeddings for a list of documents.
    
    Parameters:
        texts (list): A list of documents to embed.

    Returns:
        np.ndarray: An array of embeddings for the documents.
    """
    if self.custom_embeddings is not None:
        # If custom embeddings are provided, return those instead
        return np.array(self.custom_embeddings)
    
    # Generate new embeddings using the model if no custom embeddings are available
    inputs = self.tokenizer(texts, return_tensors='pt', padding=True, truncation=True)

    with torch.no_grad():
        outputs = self.model(**inputs)

    # Use the pooled output or the CLS token as the embedding
    embeddings = outputs.last_hidden_state[:, 0, :]  # CLS token for sentence embedding
    return embeddings.numpy()  # Convert to NumPy array

def embed_query(self, query: str) -> np.ndarray:
    """
    Generate an embedding for a single query.
    
    Parameters:
        query (str): The query to embed.

    Returns:
        np.ndarray: The embedding for the query.
    """
    # If custom embeddings are provided, generate embedding based on those
    if self.custom_embeddings is not None:
        # You might want to handle how to relate the query to your custom embeddings
        raise NotImplementedError("Custom query embeddings are not supported with provided custom embeddings.")
    
    # Generate a new embedding using the model
    inputs = self.tokenizer(query, return_tensors='pt', padding=True, truncation=True)

    with torch.no_grad():
        outputs = self.model(**inputs)

    # Use the pooled output or the CLS token as the embedding
    embedding = outputs.last_hidden_state[:, 0, :]  # CLS token for single query embedding
    return embedding.numpy()  # Convert to NumPy array

Initialize the custom embeddings class

custom_embeddings = CustomHuggingFaceRagasEmbeddings(ragas_dataset)

ragas_embeddings = CustomHuggingFaceRagasEmbeddings(model_name=model_name, custom_embeddings=custom_embeddings)

amin-kh96 avatar Oct 14 '24 12:10 amin-kh96

The code you wrote defines a custom embeddings class, CustomHuggingFaceRagasEmbeddings, which integrates Hugging Face models to generate or use pre-computed embeddings. Here's how you can use it:

  1. Purpose of the Code:

    • Custom Embeddings: It allows you to use pre-computed embeddings if available, or generate new ones using a specified Hugging Face model.
    • Flexibility: You can switch between using pre-computed embeddings and generating new ones based on your needs.
  2. Next Steps:

    • Instantiate the Class: Ensure you initialize the class with the correct parameters. If you have pre-computed embeddings, pass them to the custom_embeddings parameter.
    • Use in Evaluation: Pass an instance of this class to the evaluate function as the embeddings parameter.
  3. Example Usage:

    # Initialize with a model name and optional custom embeddings
    custom_embeddings = CustomHuggingFaceRagasEmbeddings(
        model_name='distilbert-base-uncased',
        custom_embeddings=your_precomputed_embeddings  # Replace with your embeddings if available
    )
    
    # Use in the evaluate function
    result = evaluate(
        dataset=ragas_dataset,
        metrics=metrics,
        embeddings=custom_embeddings
    )
    
    print(result)
    

Ensure that your dataset and metrics are correctly set up to work with the embeddings. This setup allows you to leverage both pre-computed and dynamically generated embeddings in your evaluation process.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 14 '24 12:10 dosubot[bot]

@dosu @jjmachan you don't understand my point. I created a custom embedding class to use the custom evaluation function(that directly supports the embeddings). but when I run it I see the error "the question must be the string". this is my code:

Assuming data_set is a list of dictionaries

ragas_data = [ { "question": entry["text_vector_1"], # Assuming this is a list of strings "answer": entry["text_vector_2"], # Assuming this is a list of strings "contexts": entry["chunks"] # Assuming this is a list of lists of strings } for entry in data_set ]

Create the required structure that structures the data for the Hugging Face Dataset creation.

formatted_data = { "question": [entry["question"] for entry in ragas_data], "contexts": [entry["contexts"] for entry in ragas_data], "answer": [entry["answer"] for entry in ragas_data] }

Define the column_map to match custom columns to expected ones

column_map = { "question": "question", # Match to the correct column "answer": "answer", # Match to the correct column "contexts": "contexts" # Match to the correct column }

Create a Dataset using the Hugging Face datasets library

ragas_dataset = Dataset.from_dict(formatted_data)

model_name = 'distilbert-base-uncased'

class CustomHuggingFaceRagasEmbeddings(BaseRagasEmbeddings): def init(self, model_name: str, custom_embeddings: list = None): """ Initialize the Custom Hugging Face Ragas Embeddings with the specified model and custom embeddings.

    Parameters:
        model_name (str): The name of the Hugging Face model to use (e.g., 'distilbert-base-uncased').
        custom_embeddings (list): A list of pre-computed custom embeddings (optional).
    """
    model_name = 'distilbert-base-uncased'
    self.tokenizer = AutoTokenizer.from_pretrained(model_name)
    self.model = AutoModel.from_pretrained(model_name)
    self.custom_embeddings = custom_embeddings  # Store the custom embeddings

def embed_documents(self, texts: list) -> np.ndarray:
    """
    Generate embeddings for a list of documents.
    
    Parameters:
        texts (list): A list of documents to embed.

    Returns:
        np.ndarray: An array of embeddings for the documents.
    """
    if self.custom_embeddings is not None:
        # If custom embeddings are provided, return those instead
        return np.array(self.custom_embeddings)
    
    # Generate new embeddings using the model if no custom embeddings are available
    inputs = self.tokenizer(texts, return_tensors='pt', padding=True, truncation=True)

    with torch.no_grad():
        outputs = self.model(**inputs)

    # Use the pooled output or the CLS token as the embedding
    embeddings = outputs.last_hidden_state[:, 0, :]  # CLS token for sentence embedding
    return embeddings.numpy()  # Convert to NumPy array

def embed_query(self, query: str) -> np.ndarray:
    """
    Generate an embedding for a single query.
    
    Parameters:
        query (str): The query to embed.

    Returns:
        np.ndarray: The embedding for the query.
    """
    # If custom embeddings are provided, generate embedding based on those
    if self.custom_embeddings is not None:
        # You might want to handle how to relate the query to your custom embeddings
        raise NotImplementedError("Custom query embeddings are not supported with provided custom embeddings.")
    
    # Generate a new embedding using the model
    inputs = self.tokenizer(query, return_tensors='pt', padding=True, truncation=True)

    with torch.no_grad():
        outputs = self.model(**inputs)

    # Use the pooled output or the CLS token as the embedding
    embedding = outputs.last_hidden_state[:, 0, :]  # CLS token for single query embedding
    return embedding.numpy()  # Convert to NumPy array

Initialize the custom embeddings class

custom_embeddings = CustomHuggingFaceRagasEmbeddings(ragas_dataset)

ragas_embeddings = CustomHuggingFaceRagasEmbeddings(model_name=model_name, custom_embeddings=custom_embeddings)

Define the custom LLM class

class CustomRagasLLM(BaseRagasLLM): def init(self, api_key: str = None): """ Initialize the custom LLM, optionally using an API key if necessary. """ self.api_key = api_key

def _call(self, prompt: str) -> str:
    """
    Process the prompt and return a result. This can be customized to
    use a local model or perform any required logic.
    """
    if not self.api_key:
        return f"Processed: {prompt} (without API key)"
    else:
        # Handle LLM response if using an API
        return f"Processed: {prompt} (with API key: {self.api_key})"
   
def generate_text(
    self,
    prompt: PromptValue,
    n: int = 1,
    temperature: float = 1e-8,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.List = []
) -> LLMResult:
    # Synchronous generation logic
    text = self._call(prompt)
    return LLMResult(generations=[Generation(text=text)])
async def agenerate_text(
    self,
    prompt: PromptValue,
    n: int = 1,
    temperature: float = 1e-8,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.List = []
) -> LLMResult:
    """
    Asynchronous method to generate text. This should allow for async processing.
    """
    # Simulate an asynchronous call, here we directly call the sync method for now
    text =await self._call(prompt)
    return LLMResult(generations=[Generation(text=text)])     

custom_llm =CustomRagasLLM(api_key=None)

Define the evaluation metrics

metrics = [context_utilization]

Create a Dataset using the Hugging Face datasets library

#ragas_dataset = Dataset.from_dict(formatted_data)

Run the evaluation

evaluation_report = evaluate(ragas_dataset, metrics=metrics, embeddings=ragas_embeddings,llm=custom_llm, column_map=column_map) # Pass your custom LLM here

print("RAGAS Evaluation Report:") print(evaluation_report)

amin-kh96 avatar Oct 14 '24 13:10 amin-kh96

To address the error "the question must be a string," you need to ensure that the question field in your dataset is a string, not an embedding or list of floats. Here's how you can adapt your setup:

  1. Convert Embeddings to Strings: If your question field contains embeddings, you need to convert them to a string representation suitable for evaluation. This might involve using a placeholder or a descriptive string.

  2. Modify the Dataset: Ensure that the question field in formatted_data is a string:

    formatted_data = {
        "question": [str(entry["question"]) for entry in ragas_data],  # Convert to string
        "contexts": [entry["contexts"] for entry in ragas_data],
        "answer": [entry["answer"] for entry in ragas_data]
    }
    
  3. Use the column_map Correctly: Ensure that the column_map is correctly mapping your dataset fields to the expected names in the evaluation function.

  4. Custom Embeddings Handling: If you want to use embeddings directly, ensure your evaluation logic can handle them. This might involve customizing the evaluation function to work with embeddings instead of strings.

By ensuring the question field is a string, you should resolve the error. If you need to evaluate using embeddings, consider adapting the evaluation logic to handle embeddings directly, possibly by modifying the evaluation function to accept and process embeddings.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 14 '24 13:10 dosubot[bot]

@dosu how can I Extend or Override the Evaluation Method? I want to create a custom evaluation function that bypasses text input and directly operates on embeddings. I want to Modify it so that it doesn't expect raw text but rather accepts embeddings as the primary data format. This requires modifying the parts of the evaluation pipeline that deal with text, and replacing them with operations on vectors.

amin-kh96 avatar Oct 14 '24 13:10 amin-kh96

To create a custom evaluation function that operates directly on embeddings, you can extend or override the existing evaluation method in the Ragas framework. Here's a step-by-step guide:

  1. Modify the Evaluation Function: Create a custom function that processes embeddings instead of raw text.

  2. Implement Custom Logic: Use the embeddings to perform your evaluation logic.

Here's an example of how you can achieve this:

from ragas.embeddings.base import BaseRagasEmbeddings

def custom_evaluate(
    dataset,
    metrics,
    embeddings: BaseRagasEmbeddings,
    raise_exceptions: bool = False,
    column_map: t.Optional[t.Dict[str, str]] = None,
    run_config: t.Optional[RunConfig] = None,
) -> Result:
    if embeddings is None:
        raise ValueError("Embeddings must be provided for custom evaluation!")

    # Extract embeddings for questions and answers
    query_embeddings = [embeddings.get_embedding(q) for q in dataset["question"]]
    answer_embeddings = [embeddings.get_embedding(a) for a in dataset["answer"]]

    # Implement your custom evaluation logic here
    results = []
    for qe, ae in zip(query_embeddings, answer_embeddings):
        similarity = calculate_similarity(qe, ae)  # Define this function as needed
        results.append(similarity)

    # Return results in the expected format
    return results

# Example usage
# Ensure your dataset and embeddings are set up correctly
result = custom_evaluate(
    dataset=your_dataset,
    metrics=your_metrics,
    embeddings=your_custom_embeddings
)

print(result)

Key Points:

  • Embeddings Handling: Ensure your embeddings object can provide embeddings for the data.
  • Custom Logic: Implement calculate_similarity or any other logic needed to evaluate the embeddings.
  • Dataset Structure: Ensure your dataset is structured to provide the necessary data for embedding extraction.

This approach allows you to bypass text processing and directly work with embeddings in your evaluation pipeline [1].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 14 '24 13:10 dosubot[bot]

from ragas import EvaluationDataset

no my dataset is not a list or str, pandas is not

@dosu. how to fix this error: Exception has occurred: AttributeError 'list' object has no attribute 'rename_columns' File "C:\Users\Amin\git\ragas-prototype\src\rag_eva.py", line 196, in evaluation_report = evaluate(ragas_data, metrics=metrics, embeddings=ragas_embeddings,llm=custom_llm) # Pass your custom LLM here ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'list' object has no attribute 'rename_columns'

I have a similar issue when using Pandas DataFrame.

Error:

Traceback (most recent call last): File "/Users/tobiasmolland/GitHub/Larry-Llama/scripts/ragas_generate_testset.py", line 49, in result = evaluate( ^^^^^^^^^ File "/Users/tobiasmolland/GitHub/Larry-Llama/venv/lib/python3.12/site-packages/ragas/_analytics.py", line 129, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/tobiasmolland/GitHub/Larry-Llama/venv/lib/python3.12/site-packages/ragas/evaluation.py", line 161, in evaluate dataset = remap_column_names(dataset, column_map) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/tobiasmolland/GitHub/Larry-Llama/venv/lib/python3.12/site-packages/ragas/validation.py", line 19, in remap_column_names return dataset.rename_columns(inverse_column_map) ^^^^^^^^^^^^^^^^^^^^^^ File "/Users/tobiasmolland/GitHub/Larry-Llama/venv/lib/python3.12/site-packages/pandas/core/generic.py", line 6299, in getattr return object.getattribute(self, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'DataFrame' object has no attribute 'rename_columns'

Code used:

# import dependencies
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# set models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

# create generator with models
generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

# generate testset
testset = generator.generate_with_langchain_docs(documents, test_size=5, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})
df = testset.to_pandas()

# evaluate testset
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
)

from ragas import evaluate

result = evaluate(
    df,
    metrics=[
        context_precision,
        faithfulness,
        answer_relevancy,
        context_recall,
    ],
)
result

Any solution to this yet?

use colum_map method,

amin-kh96 avatar Oct 14 '24 14:10 amin-kh96

To resolve the AttributeError: 'DataFrame' object has no attribute 'rename_columns', you should use the remap_column_names function from the Ragas library. This function is designed to handle column renaming for datasets used in the evaluate function.

Here's how you can apply it:

  1. Use remap_column_names:

    Ensure you have a column_map dictionary that maps your current DataFrame column names to the expected ones.

  2. Apply the Function:

    from ragas.validation import remap_column_names
    
    # Define your column map
    column_map = {
        "current_question_column": "question",
        "current_answer_column": "answer",
        "current_contexts_column": "contexts"
    }
    
    # Remap the DataFrame columns
    df = remap_column_names(df, column_map)
    
    # Now use the DataFrame in the evaluate function
    result = evaluate(
        df,
        metrics=[
            context_precision,
            faithfulness,
            answer_relevancy,
            context_recall,
        ],
    )
    

Ensure your column_map correctly reflects the current and expected column names [1].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 14 '24 14:10 dosubot[bot]

Hi! Something similar happens to my evaluation dataset, which is a dataframe with columns: ['user_input', 'reference_contexts', 'reference', 'synthesizer_name'], created with generator.generate_with_langchain_docs. The error is:

AttributeError: 'DataFrame' object has no attribute 'get_sample_type'.

My code:

from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness, SemanticSimilarity
from ragas import evaluate

generator_embeddings =  LangchainEmbeddingsWrapper(embedder)
generator_llm = LangchainLLMWrapper(llm)

metrics = [
    LLMContextRecall(llm=generator_llm), 
    FactualCorrectness(llm=generator_llm), 
    Faithfulness(llm=generator_llm),
    SemanticSimilarity(embeddings=generator_embeddings)
]
results = evaluate(dataset=df_test, metrics=metrics)

rcruzgar avatar Nov 11 '24 17:11 rcruzgar