ragas icon indicating copy to clipboard operation
ragas copied to clipboard

Test data generation for function calling

Open alexHeu opened this issue 1 year ago • 1 comments

Hey, I was wondering if you think it would be possible to create a synthetic dataset for function calling tasks? I would like to use that dataset for a finetuning experiment.

Thanks for any guidance that you might have!

alexHeu avatar Sep 14 '24 00:09 alexHeu

Hey @alexHeu! I'm a bot designed to assist you with bugs, questions, and contributions. How can I help you today?

Yes, it is possible to create a synthetic dataset for function calling tasks to use in a finetuning experiment. Ragas provides tools for synthetic test data generation, which can be applied to function calling tasks. Here is a relevant code snippet for generating a synthetic test dataset:

from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

TEST_SIZE = 25

# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(generator_llm, critic_llm, embeddings)

# set question type distribution
distribution = {simple: 0.5, reasoning: 0.25, multi_context: 0.25}
# generate testset
testset = generator.generate_with_llamaindex_docs(
    documents, test_size=TEST_SIZE, distributions=distribution
)
test_df = testset.to_pandas()
test_df.head()

Ragas employs an evolutionary generation paradigm to create diverse and complex questions from provided documents. This approach ensures comprehensive coverage of various question types, including reasoning, conditioning, and multi-context, which are crucial for robust evaluation and finetuning experiments [1][2].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Sep 14 '24 00:09 dosubot[bot]

@alexHeu No, Ragas currently does NOT support generating synthetic datasets for function calling/tool use tasks. Ragas synthetic data generation is limited to RAG-style Q&A scenarios.

anistark avatar Nov 03 '25 07:11 anistark

It seems the issue was answered, closing this now.

github-actions[bot] avatar Nov 07 '25 00:11 github-actions[bot]