ragas
ragas copied to clipboard
Origin of Persona-Based Synthetic Test Dataset Generation
[x] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question Does the idea of generating questions based on personas originate from the Eval-Instruct paper or the Scaling Synthetic Data Creation with 1,000,000,000 Personas paper? Or is it a custom implementation? Are there scientific references I can cite for this concept in my work?
Code Examples
class PersonaGenerationPrompt(PydanticPrompt[StringIO, Persona]):
instruction: str = (
"Using the provided summary, generate a single persona who would likely "
"interact with or benefit from the content. Include a unique name and a "
"concise role description of who they are."
)
input_model: t.Type[StringIO] = StringIO
output_model: t.Type[Persona] = Persona
examples: t.List[t.Tuple[StringIO, Persona]] = [
(
StringIO(
text="Guide to Digital Marketing explains strategies for engaging audiences across various online platforms."
),
Persona(
name="Digital Marketing Specialist",
role_description="Focuses on engaging audiences and growing the brand online.",
),
)
]
class QueryAnswerGenerationPrompt(PydanticPrompt[QueryCondition, GeneratedQueryAnswer]):
instruction: str = (
"Generate a single-hop query and answer based on the specified conditions (persona, term, style, length) "
"and the provided context. Ensure the answer is entirely faithful to the context, using only the information "
"directly from the provided context."
"### Instructions:\n"
"1. **Generate a Query**: Based on the context, persona, term, style, and length, create a question "
"that aligns with the persona's perspective and incorporates the term.\n"
"2. **Generate an Answer**: Using only the content from the provided context, construct a detailed answer "
"to the query. Do not add any information not included in or inferable from the context.\n"
)
input_model: t.Type[QueryCondition] = QueryCondition
output_model: t.Type[GeneratedQueryAnswer] = GeneratedQueryAnswer
examples: t.List[t.Tuple[QueryCondition, GeneratedQueryAnswer]] = [
(
QueryCondition(
persona=Persona(
name="Software Engineer",
role_description="Focuses on coding best practices and system design.",
),
term="microservices",
query_style="Formal",
query_length="Medium",
context="Microservices are an architectural style where applications are structured as a collection of loosely coupled services. "
"Each service is fine-grained and focuses on a single functionality.",
),
GeneratedQueryAnswer(
query="What is the purpose of microservices in software architecture?",
answer="Microservices are designed to structure applications as a collection of loosely coupled services, each focusing on a single functionality.",
),
),
]