Break up judging prompts to take advantage of implicit caching
Describe the Feature Ragas prompts appear to repeat a lot of content on each invocation, namely in defining the task and providing the examples. The prompt is then amended with the context specific data. This doesn't take advantage of implicit caching performed by providers like openai or vertexai / gemini when the message is broken up into parts. I think it would be better from a cost perspective to break these up into multiple parts (even consider providing the instructions/examples as part of system instruction) to benefit from caching on the provider side of the repeated guidelines.
Why is the feature important for you? It would reduce costs.
Additional context Here's an example I saw when I traced the evaluation prompt:
My point is simply that all the repeated instructions up to the "Now perform the same with the following input" is ripe for implicit caching as we will see the same input for many evaluation items.
[1 Items
0: {2 Items
role: "user"
content: "Your task is to judge the faithfulness of a series of statements based on a given context. For each statement you must return verdict as 1 if the statement can be directly inferred based on the context or 0 if the statement can not be directly inferred based on the context.
Please return the output in a JSON format that complies with the following schema as specified in JSON Schema:
{"$defs": {"StatementFaithfulnessAnswer": {"properties": {"statement": {"description": "the original statement, word-by-word", "title": "Statement", "type": "string"}, "reason": {"description": "the reason of the verdict", "title": "Reason", "type": "string"}, "verdict": {"description": "the verdict(0/1) of the faithfulness.", "title": "Verdict", "type": "integer"}}, "required": ["statement", "reason", "verdict"], "title": "StatementFaithfulnessAnswer", "type": "object"}}, "properties": {"statements": {"items": {"$ref": "#/$defs/StatementFaithfulnessAnswer"}, "title": "Statements", "type": "array"}}, "required": ["statements"], "title": "NLIStatementOutput", "type": "object"}Do not use single quotes in your response but double quotes,properly escaped with a backslash.
--------EXAMPLES-----------
Example 1
Input: {
"context": "John is a student at XYZ University. He is pursuing a degree in Computer Science. He is enrolled in several courses this semester, including Data Structures, Algorithms, and Database Management. John is a diligent student and spends a significant amount of time studying and completing assignments. He often stays late in the library to work on his projects.",
"statements": [
"John is majoring in Biology.",
"John is taking a course on Artificial Intelligence.",
"John is a dedicated student.",
"John has a part-time job."
]
}
Output: {
"statements": [
{
"statement": "John is majoring in Biology.",
"reason": "John's major is explicitly mentioned as Computer Science. There is no information suggesting he is majoring in Biology.",
"verdict": 0
},
{
"statement": "John is taking a course on Artificial Intelligence.",
"reason": "The context mentions the courses John is currently enrolled in, and Artificial Intelligence is not mentioned. Therefore, it cannot be deduced that John is taking a course on AI.",
"verdict": 0
},
{
"statement": "John is a dedicated student.",
"reason": "The context states that he spends a significant amount of time studying and completing assignments. Additionally, it mentions that he often stays late in the library to work on his projects, which implies dedication.",
"verdict": 1
},
{
"statement": "John has a part-time job.",
"reason": "There is no information given in the context about John having a part-time job.",
"verdict": 0
}
]
}
Example 2
Input: {
"context": "Photosynthesis is a process used by plants, algae, and certain bacteria to convert light energy into chemical energy.",
"statements": [
"Albert Einstein was a genius."
]
}
Output: {
"statements": [
{
"statement": "Albert Einstein was a genius.",
"reason": "The context and statement are unrelated",
"verdict": 0
}
]
}
-----------------------------
Now perform the same with the following input
input: {
"context": "Europe is a continent, not a country, so it doesn't have a capital. However, some cities are often considered important centers in Europe:\n\n- **Brussels, Belgium** is often called the \"capital of Europe\" because it hosts major institutions of the European Union (EU), including the European Commission, the European Council, and sessions of the European Parliament.\n- **Strasbourg, France** also hosts sessions of the European Parliament.\n- **Luxembourg City, Luxembourg** is home to several EU institutions as well.\n\nIf you meant the capital of a specific European country, please let me know!",
"statements": [
"Europe does not have a single official capital city.",
"Brussels is in Belgium.",
"Brussels is widely considered the de facto capital of Europe.",
"Brussels is considered the de facto capital because it is the seat of many important European Union institutions.",
"The European Commission is located in Brussels.",
"The European Parliament is located in Brussels.",
"The Council of the European Union is located in Brussels."
]
}
Output: "
}
]
Hi @aachkar-samsara @anistark , I’d like to work on this issue. From what I understand, the evaluation prompts repeat a lot of instructions/examples for each input, which prevents implicit caching from providers like OpenAI or VertexAI, impacting cost and efficiency.
My approach:
- Keep repeated instructions/examples as a system prompt.
- Send only the context-specific input each time.
- Verify outputs remain correct while benefiting from caching.
Would it be okay for me to start?
cc: @jjmachan
Personally, I'm in favor of checking out llm caching approach more.
System prompts would break BaseRagasLLM interface and might not work for all providers.
Understood, I’ll look into the caching approach while keeping compatibility in mind. Thanks for guidance @anistark .