ragas icon indicating copy to clipboard operation
ragas copied to clipboard

the generated testset is empty

Open Kevinddddddd opened this issue 1 year ago • 11 comments

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question rags version: 0.28.0 I use the script in the documentation to generate non-english testset, but the output is empty.

Code Examples

chat_model = AzureChatOpenAI(
    api_version=api_version,
    model="gpt-4o",
    azure_deployment="gpt-4o"
)
embedding_model = AzureOpenAIEmbeddings(
    model="text-embedding-3-large-turing",
    api_version=api_version,
    azure_deployment="text-embedding-3-large-turing"
)
generator_llm = LangchainLLMWrapper(chat_model)
generator_embeddings = LangchainEmbeddingsWrapper(embedding_model)

personas = [
    Persona(
        name="curious student",
        role_description="A student who is curious about the world and wants to learn more about different cultures and languages",
    ),
]

generator = TestsetGenerator(
    llm=generator_llm, embedding_model=generator_embeddings, persona_list=personas
)

distribution = [
    (SingleHopSpecificQuerySynthesizer(llm=generator_llm), 1.0),
]

path = "/data/eco_rag/testdata"
loader = DirectoryLoader(path, loader_cls=TextLoader, show_progress=True)
docs = loader.load()

for query, _ in distribution:
    prompts = await query.adapt_prompts("spanish", llm=generator_llm)
    query.set_prompts(**prompts)

dataset = generator.generate_with_langchain_docs(
    docs,
    testset_size=3,
    query_distribution=distribution
)

Additional context the output log is here: 100%|██████████| 1/1 [00:00<00:00, 730.59it/s] Applying HeadlinesExtractor: 0%| | 0/1 [00:00<?, ?it/s] Property 'summary' already exists in node 'e425ec'. Skipping! Property 'summary_embedding' already exists in node 'e425ec'. Skipping! Generating Scenarios: 0%| | 0/1 [00:00<?, ?it/s] Generating Samples: 0it [00:00, ?it/s]

the document is only one downloaded from https://huggingface.co/datasets/explodinggradients/Sample_non_english_corpus 截屏2024-12-18 16 40 59

Kevinddddddd avatar Dec 18 '24 08:12 Kevinddddddd

when i change the document (Tokyo.txt) to Madrid.txt, it works. Why does this situation occur?

截屏2024-12-18 16 48 55 截屏2024-12-18 16 48 14

Kevinddddddd avatar Dec 18 '24 08:12 Kevinddddddd

I have tested models from all vendors, and only the GPT-4 model from OpenAI supports generating Chinese document test datasets. The others either have speed limitations or generate empty test datasets. I hope the official can quickly solve this problem so that this tool can become a useful tool.

HansonJames avatar Dec 30 '24 05:12 HansonJames

I have tested models from all vendors, and only the GPT-4 model from OpenAI supports generating Chinese document test datasets. The others either have speed limitations or generate empty test datasets. I hope the official can quickly solve this problem so that this tool can become a useful tool. The Tongyi model supports Chinese document, i tried qwen-max and it works.

hy813 avatar Jan 04 '25 13:01 hy813

I have tested models from all vendors, and only the GPT-4 model from OpenAI supports generating Chinese document test datasets. The others either have speed limitations or generate empty test datasets. I hope the official can quickly solve this problem so that this tool can become a useful tool. The Tongyi model supports Chinese document, i tried qwen-max and it works.

Is there a speed limit?

HansonJames avatar Jan 04 '25 13:01 HansonJames

I have tested models from all vendors, and only the GPT-4 model from OpenAI supports generating Chinese document test datasets. The others either have speed limitations or generate empty test datasets. I hope the official can quickly solve this problem so that this tool can become a useful tool. The Tongyi model supports Chinese document, i tried qwen-max and it works.

Is there a speed limit?

Yes, limited by TPM and QPM. You can refer to the dashscope official document here : https://help.aliyun.com/zh/dashscope/developer-reference/tongyi-thousand-questions-metering-and-billing

hy813 avatar Jan 04 '25 13:01 hy813

image The qwen max model doesn't work either!

HansonJames avatar Jan 05 '25 03:01 HansonJames

Same here in Thai language :[

takipipo avatar Feb 27 '25 05:02 takipipo

image The qwen max model doesn't work either!

我尝试过了qwen-max以及qwen-plus,他们时不时可以正常工作,我怀疑是大模型平台的限速导致无法输出。

FOXandrabbi avatar Feb 28 '25 04:02 FOXandrabbi

This documentation shows how to generate test set on non-english language https://docs.ragas.io/en/stable/howtos/customizations/testgenerator/_language_adaptation/#load-and-adapt-queries

takipipo avatar Feb 28 '25 04:02 takipipo

I encountered the same issue on deepseek-chat.

import asyncio

from langchain_community.document_loaders import TextLoader
from langchain_openai import ChatOpenAI
from langchain_core.callbacks import BaseCallbackHandler
from langchain_huggingface import HuggingFaceEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas.testset import TestsetGenerator
from ragas.testset.persona import Persona
from ragas.testset.synthesizers.single_hop.specific import SingleHopSpecificQuerySynthesizer
from ragas.testset.transforms.extractors.llm_based import NERExtractor
from ragas.testset.transforms.splitters import HeadlineSplitter

class TestCallback(BaseCallbackHandler):
    def on_llm_start(self, serialized, prompts, **kwargs):
        print(f"**********Prompts*********:\n {prompts[0]}\n\n")

    def on_llm_end(self, response, **kwargs):
        print(f"**********Response**********:\n {response}\n\n")

llm = ChatOpenAI(model='deepseek-chat', base_url='https://api.deepseek.com/v1', callbacks=[TestCallback()])
embeddings = HuggingFaceEmbeddings(model_name='BAAI/bge-m3', model_kwargs={'trust_remote_code': True})
loader = TextLoader('doc.txt', encoding='utf-8') 
documents = loader.load()

personas = [
    Persona(
        name="Curious Student",
        role_description="A student who is curious about the world and wants to learn more about different cultures and languages",
    ),
]

generator_llm = LangchainLLMWrapper(llm)
generator_embeddings = LangchainEmbeddingsWrapper(embeddings)

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings, persona_list=personas)
query = SingleHopSpecificQuerySynthesizer(llm=generator_llm)
prompts = asyncio.run(query.adapt_prompts('chinese', llm=generator_llm))
query.set_prompts(**prompts)
transforms = [HeadlineSplitter(), NERExtractor(llm=generator_llm)]
dist = [(query, 1.0)]
dataset = generator.generate_with_langchain_docs(documents, testset_size=1, transforms=transforms, query_distribution=dist)
**********Prompts*********:
 Human: Given a list of themes and personas with their roles, associate each persona with relevant themes based on their role description.
Please return the output in a JSON format that complies with the following schema as specified in JSON Schema:
{"properties": {"mapping": {"additionalProperties": {"items": {"type": "string"}, "type": "array"}, "title": "Mapping", "type": "object"}}, "required": ["mapping"], "title": "PersonaThemesMapping", "type": "object"}Do not use single quotes in your response but double quotes,properly escaped with a backslash.

--------EXAMPLES-----------
Example 1
Input: {
    "themes": [
        "同理心",
        "包容性",
        "远程工作"
    ],
    "personas": [
        {
            "name": "人力资源经理",
            "role_description": "专注于包容性和员工支持。"
        },
        {
            "name": "远程团队领导",
            "role_description": "管理远程团队沟通。"
        }
    ]
}
Output: {
    "mapping": {
        "HR Manager": [
            "包容性",
            "同理心"
        ],
        "Remote Team Lead": [
            "远程工作",
            "同理心"
        ]
    }
}
-----------------------------

Now perform the same with the following input
input: {
    "themes": [
        "RAGFlow",
        "Docker",
        "Elasticsearch",
        "Infinity",
        "MinIO",
        "Redis",
        "MySQL",
        "HuggingFace",
        "Python",
        "Linux"
    ],
    "personas": [
        {
            "name": "Curious Student",
            "role_description": "A student who is curious about the world and wants to learn more about different cultures and languages"
        }
    ]
}
Output: 


Generating Scenarios: 100%|██████████| 1/1 [00:06<00:00,  6.75s/it]
**********Response**********:
 generations=[[ChatGeneration(text='{\n    "mapping": {\n        "Curious Student": []\n    }\n}', generation_info={'finish_reason': 'stop', 'logprobs': None}, message=AIMessage(content='{\n    "mapping": {\n        "Curious Student": []\n    }\n}', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 395, 'total_tokens': 412, 'completion_tokens_details': None, 'prompt_tokens_details': {'audio_tokens': None, 'cached_tokens': 384}, 'prompt_cache_hit_tokens': 384, 'prompt_cache_miss_tokens': 11}, 'model_name': 'deepseek-chat', 'system_fingerprint': 'fp_3a2571e1b4_prod0225', 'finish_reason': 'stop', 'logprobs': None}, id='run-00f8660d-4ff7-4a16-90fe-d43b6647e0e5-0', usage_metadata={'input_tokens': 395, 'output_tokens': 17, 'total_tokens': 412, 'input_token_details': {'cache_read': 384}, 'output_token_details': {}}))]] llm_output={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 395, 'total_tokens': 412, 'completion_tokens_details': None, 'prompt_tokens_details': {'audio_tokens': None, 'cached_tokens': 384}, 'prompt_cache_hit_tokens': 384, 'prompt_cache_miss_tokens': 11}, 'model_name': 'deepseek-chat', 'system_fingerprint': 'fp_3a2571e1b4_prod0225'} run=None type='LLMResult'


Generating Samples: 0it [00:00, ?it/s]

lastsummerx avatar Mar 06 '25 15:03 lastsummerx

I have this same problem but my dataset is in English, using ollama works well but using AzureOpenAI with GPT-4 models and the llama index integration it doesn't generate any samples. Has anyone found a workaround?

bojackhorseman0309 avatar Jul 24 '25 16:07 bojackhorseman0309