ragas icon indicating copy to clipboard operation
ragas copied to clipboard

ValueError: No nodes that satisfied the given filter. Try changing the filter. occurs because the generate_personas_from_kg function in the ragas library is unable to find any nodes in the knowledge graph (KG) that match the specified filter criteria.

Open gshravendra opened this issue 8 months ago • 3 comments

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug I am trying to generate ragas testset for numerous documents using ragas TestsetGenerator but i am getting this error ValueError: No nodes that satisfied the given filter. Try changing the filter. occurs because the generate_personas_from_kg function in the ragas library is unable to find any nodes in the knowledge graph (KG) that match the specified filter criteria.

Ragas version: 0.2.14 Python version: 3.13.2

Code to Reproduce

from langchain_community.document_loaders import DirectoryLoader import os from dotenv import load_dotenv from ragas.llms import LangchainLLMWrapper from ragas.embeddings import LangchainEmbeddingsWrapper from langchain_openai import ChatOpenAI, OpenAIEmbeddings from ragas.testset import TestsetGenerator import time

Load environment variables

load_dotenv()

Verify API key

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") if not OPENAI_API_KEY: raise ValueError("OPENAI_API_KEY not set in environment variables.")

Load documents

path = "Sample_Docs_Markdown/" loader = DirectoryLoader(path, glob="**/*.md") try: docs = loader.load() except Exception as e: raise RuntimeError(f"Error loading documents: {e}")

Preprocess documents to ensure required metadata exists

for doc in docs: if not hasattr(doc, "metadata") or not isinstance(doc.metadata, dict): doc.metadata = {}

# Ensure every document has a 'summary' property
if not doc.metadata.get("summary"):
    doc.metadata["summary"] = doc.page_content.strip()[:200]  # Use the first 200 characters as a fallback

# Ensure a 'headlines' property exists
if "headlines" not in doc.metadata:
    doc.metadata["headlines"] = []

print(f"Loaded {len(docs)} documents.") print(f"Sample Document loaded successfully")

Initialize LLM and embeddings

try: generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o")) generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings()) except Exception as e: raise RuntimeError(f"Error initializing LLM or embeddings: {e}")

Initialize Testset Generator

try: generator = TestsetGenerator.from_langchain(llm=generator_llm, embedding_model=generator_embeddings) except Exception as e: raise RuntimeError(f"Error initializing Testset Generator: {e}")

Retry logic for dataset generation

max_retries = 3 for attempt in range(max_retries): try: # Generate dataset dataset = generator.generate_with_langchain_docs(docs, testset_size=10) break except ValueError as err: print(f"Attempt {attempt + 1} failed: {err}") if attempt < max_retries - 1: time.sleep(2) # Wait before retrying else: raise ValueError( f"Error generating dataset after {max_retries} attempts: {err}. " f"Please adjust document preprocessing or filter settings." )

Convert the dataset to a pandas DataFrame and print the results

try: result = dataset.to_pandas() print("Generated Dataset:") print(result)

# Save the dataset to a CSV file
result.to_csv("generated_testset.csv", index=False)
print("Dataset saved to 'generated_testset.csv'.")

except Exception as e: raise RuntimeError(f"Error processing or saving the dataset: {e}")

Error trace

Loaded 11 documents. Sample Document loaded successfully Applying HeadlinesExtractor: 0%| | 0/5 [00:00<?, ?it/s]unable to apply transformation: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt' Applying HeadlinesExtractor: 20%|██████████▍ | 1/5 [01:34<06:19, 94.77s/it]unable to apply transformation: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt' Applying HeadlinesExtractor: 40%|████████████████████▊ | 2/5 [01:48<02:20, 46.88s/it]unable to apply transformation: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt' Applying HeadlinesExtractor: 60%|███████████████████████████████▏ | 3/5 [02:19<01:19, 39.68s/it]unable to apply transformation: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt' Applying HeadlinesExtractor: 80%|█████████████████████████████████████████▌ | 4/5 [02:25<00:26, 26.60s/it]unable to apply transformation: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt' Applying HeadlineSplitter: 0%| | 0/11 [00:00<?, ?it/s]unable to apply transformation: 'headlines' property not found in this node unable to apply transformation: 'headlines' property not found in this node unable to apply transformation: 'headlines' property not found in this node unable to apply transformation: 'headlines' property not found in this node unable to apply transformation: 'headlines' property not found in this node unable to apply transformation: 'headlines' property not found in this node unable to apply transformation: 'headlines' property not found in this node unable to apply transformation: 'headlines' property not found in this node unable to apply transformation: 'headlines' property not found in this node unable to apply transformation: 'headlines' property not found in this node unable to apply transformation: 'headlines' property not found in this node Applying SummaryExtractor: 0%| | 0/5 [00:00<?, ?it/s]unable to apply transformation: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt' Applying SummaryExtractor: 20%|██████████▊ | 1/5 [01:28<05:54, 88.61s/it]unable to apply transformation: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt' Applying SummaryExtractor: 40%|█████████████████████▌ | 2/5 [01:36<02:03, 41.06s/it]unable to apply transformation: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt' Applying SummaryExtractor: 60%|████████████████████████████████▍ | 3/5 [02:28<01:32, 46.04s/it]unable to apply transformation: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt' Applying SummaryExtractor: 80%|███████████████████████████████████████████▏ | 4/5 [02:40<00:32, 32.57s/it]unable to apply transformation: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt' Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]: 0%| | 0/5 [00:00<?, ?it/s]unable to apply transformation: node.property('summary') must be a string, found '<class 'NoneType'>' unable to apply transformation: node.property('summary') must be a string, found '<class 'NoneType'>' unable to apply transformation: node.property('summary') must be a string, found '<class 'NoneType'>' unable to apply transformation: node.property('summary') must be a string, found '<class 'NoneType'>' unable to apply transformation: node.property('summary') must be a string, found '<class 'NoneType'>' Applying [CosineSimilarityBuilder, OverlapScoreBuilder]: 0%| | 0/2 [00:00<?, ?it/s]unable to apply transformation: Node 1c81d704-9181-40c1-9629-8c7d2e73f257 has no summary_embedding Attempt 1 failed: No nodes that satisfied the given filer. Try changing the filter.

Expected behavior The script should load the documents, preprocess them, and generate a dataset without errors. The dataset will be saved as generated_testset.csv in the current directory.

Additional context I want to generate testset for multiple documents please suggest if any other better alternative are for this type of usecase and let me know if @anymone could help on the ragas testset generation.

Thanks in Advance.🙏

gshravendra avatar Apr 01 '25 10:04 gshravendra