feat: extend PromptBuilder and deprecate DynamicPromptBuilder
Related Issues
Currently we cannot have both:
- a default prompt template defined (PromptBuilder)
- dynamically change prompt templates at runtime (DynamicPromptBuilder)
There are two options:
- A we extend
DynamicPromptBuilderand leavePromptBuilderas is - B we extend
PromptBuilderand deprecateDynamicPromptBuilder
Edit 07.05.: We decided to go with B
This is Option B See https://github.com/deepset-ai/haystack/pull/7652 for Option A
Proposed Changes:
This extends PromptBuilder to change prompts at query time.
default_template = "This is the default prompt: \\n Query: {{query}}"
prompt_builder = PromptBuilder(template=default_template)
pipe = Pipeline()
pipe.add_component("prompt_builder", prompt_builder)
# using the default prompt
result = pipe.run(
data={
"prompt_builder": {
"query": "Where does the speaker live?",
},
}
)
# "This is the default prompt: \n Query: Where does the speaker live?"
# using the dynamic prompt
result = pipe.run(
data={
"prompt_builder": {
"template": "This is the dynamic prompt:\\n Query: {{query}}",
"query": "Where does the speaker live?",
},
}
)
# "This is the dynamic prompt: \n Query: Where does the speaker live?"
How did you test it?
- added tests
Notes for the reviewer
- ~~There is a breaking change:
required_variablesparam has been changed tooptional_variablesas most variables of templates are required anyways. We can undo that if necessary.~~ DynamicPromptBuilderis being deprecated- The
Chatcounterpart toPromptBuilderis implemented in https://github.com/deepset-ai/haystack/pull/7663
Checklist
- I have read the contributors guidelines and the code of conduct
- I have updated the related issue with new insights and changes
- I added unit tests and updated the docstrings
- I've used one of the conventional commit types for my PR title:
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:. - I documented my code
- I ran pre-commit hooks and fixed any issue
We decided to go with this approach B.
I've removed all breaking changes. PromptBuilder should be have the same as before, extended by the dynamic template functionality.
Pull Request Test Coverage Report for Build 9172968594
Details
- 0 of 0 changed or added relevant lines in 0 files are covered.
- No unchanged relevant lines lost coverage.
- Overall coverage increased (+0.02%) to 90.575%
| Totals | |
|---|---|
| Change from base Build 9129529675: | 0.02% |
| Covered Lines: | 6602 |
| Relevant Lines: | 7289 |
💛 - Coveralls
Chat counterpart is being implemented in https://github.com/deepset-ai/haystack/pull/7663
@tstadel code is solid, my main concern is how to explain this to a user (cc @dfokina ) so that everything is clear and easily digested. Here is my proposal, see if it is easier for you to comprehend as well and adjust accordingly in the class pydocs and elsewhere in the documentation:
The PromptBuilder component provides a flexible way to generate prompts using Jinja2 templates. It can be used either standalone or as a part of a pipeline, allowing for both static and dynamic prompt generation.
Using PromptBuilder Standalone
You can use PromptBuilder with a static template provided at initialization or override it at runtime:
-
Static template usage: Define a template at initialization and pass in relevant variables directly to the
runmethod.from haystack.components.builders import PromptBuilder template = "Translate the following context to {{ target_language }}. Context: {{ snippet }}; Translation:" builder = PromptBuilder(template=template) result = builder.run(target_language="spanish", snippet="I can't speak spanish.") print(result) # Output: # {'prompt': "Translate the following context to spanish. Context: I can't speak spanish.; Translation:"} -
Dynamic template usage: Override the static template by providing a new template directly to the
runmethod.template = "Translate the following context to {{ target_language }}. Context: {{ snippet }}; Translation:" builder = PromptBuilder(template=template) summary_template = "Translate to {{ target_language }} and summarize the following context. Context: {{ snippet }}; Summary:" result = builder.run(target_language="spanish", snippet="I can't speak spanish.", template=summary_template) print(result) # Output: # {'prompt': "Translate to spanish and summarize the following context. Context: I can't speak spanish.; Summary:"}
Using PromptBuilder in a pipeline
Static template in a pipeline:
When using a static template in a pipeline, define the variables argument during initialization to allow input slots for other components to pass data into PromptBuilder, as in, for example, variables=["documents"] in the PromptBuilder initialization below.
from typing import List
from haystack import Pipeline, component, Document
from haystack.utils import Secret
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
static_template = "Summarize the following context in {{ target_language }}: {{ documents[0].content }}"
prompt_builder = PromptBuilder(template=static_template, variables=["documents"])
llm = OpenAIGenerator(api_key=Secret.from_token("<your-api-key>"), model="gpt-3.5-turbo")
@component
class DocumentProducer:
@component.output_types(documents=List[Document])
def run(self, doc_input: str):
return {"documents": [Document(content=doc_input)]}
pipe = Pipeline()
pipe.add_component("doc_producer", DocumentProducer())
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("doc_producer.documents", "prompt_builder.documents")
pipe.connect("prompt_builder.prompt", "llm.prompt")
result = pipe.run(
data={
"doc_producer": {"doc_input": "This is a test document about Berlin."},
"prompt_builder": {"template_variables": {"target_language": "Spanish"}},
}
)
print(result)
# Output:
# {'llm': {'replies': ['Este es un documento de prueba sobre Berlín.'],
# 'meta': [{'model': 'gpt-3.5-turbo-0613',
# 'index': 0,
# 'finish_reason': 'stop',
# 'usage': {'prompt_tokens': 28,
# 'completion_tokens': 8,
# 'total_tokens': 36}}]}}
Note how in pipeline PromptBuilder usage, some of the prompt builder template variable values are coming from other components in the pipeline (e.g., documents coming from DocumentProducer) as well as being directly passed by the user to the pipeline run invocation via template_variables (e.g., target_language directly specified by user).
Dynamic template in a pipeline:
For dynamic template usage, we also define the variables argument during initialization to allow input slots for other components to pass data into PromptBuilder, as in, for example, variables=["documents"] in the PromptBuilder initialization below.
from typing import List
from haystack import Pipeline, component, Document
from haystack.utils import Secret
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
prompt_builder = PromptBuilder(variables=["documents"])
llm = OpenAIGenerator(api_key=Secret.from_token("<your-api-key>"), model="gpt-3.5-turbo")
@component
class DocumentProducer:
@component.output_types(documents=List[Document])
def run(self, doc_input: str):
return {"documents": [Document(content=doc_input)]}
pipe = Pipeline()
pipe.add_component("doc_producer", DocumentProducer())
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("doc_producer.documents", "prompt_builder.documents")
pipe.connect("prompt_builder.prompt", "llm.prompt")
dynamic_template = "Here is the document: {{documents[0].content}} \\n Answer: {{query}}"
result = pipe.run(
data={
"doc_producer": {"doc_input": "Hello world, I live in Berlin"},
"prompt_builder": {
"template": dynamic_template,
"template_variables": {"query": "Where does the speaker live?"},
},
}
)
print(result)
# Output:
# {'llm': {'replies': ['The speaker lives in Berlin.'],
# 'meta': [{'model': 'gpt-3.5-turbo-0613',
# 'index': 0,
# 'finish_reason': 'stop',
# 'usage': {'prompt_tokens': 28,
# 'completion_tokens': 6,
# 'total_tokens': 34}}]}}
Note how dynamic PromptBuilder pipeline usage is very similar to static except that in pipeline run parameters, we pass a new template via the PromptBuilder's template parameter. Note however that this new template also has the documents prompt template value coming from other components. We cannot at runtime redefine input data slots coming from other components. Therefore, our new template also has a documents variable usage in it.
Important concepts to remember
-
Template variables vs. pipeline variables:
- Template variables: Specified by the user directly via the
template_variablesargument to therunmethod. - Pipeline variables: Passed indirectly from other components through the pipeline graph and declared (their names only) via the
variablesargument during initialization.
- Template variables: Specified by the user directly via the
-
Static vs dynamic templates:
- Static templates are set during initialization.
- Dynamic templates can override static templates at runtime through the
runmethod'stemplateparameter.
-
Variable precedence:
- Variables provided by the user directly via
template_variablestake precedence over those coming from other components in the pipeline (kwargs).
- Variables provided by the user directly via
Based on the documentation provided, my questions are:
- When using a static prompt, do we have to use the
variablesparameter with a PromptBuilder in a pipeline? If so, this is a breaking change
static_template = "Summarize the following context in {{ target_language }}: {{ documents[0].content }}"
prompt_builder = PromptBuilder(template=static_template, variables=["documents"])
- Can we drop the "template_variables" key as we provide values for the variables? With the suggested version, this is a breaking change
result = pipe.run(
data={
"doc_producer": {"doc_input": "This is a test document about Berlin."},
"prompt_builder": {"template_variables": {"target_language": "Spanish"}},
}
)
- Can we eliminate the
variablesparameter even in a dynamically prompted setting by compromising on the validation?
@bilgeyucel let me try to answer but I'll leave it to @tstadel to confirm.
- don't have to use it - it is optional, no
variables-> no other components providing data (e.g. documents) to PB. - It is an additional feature, to me it doesn't seem like breaking, @tstadel will confirm
- We need to use
variableswhenever other components provide PromptBuilder with template variables data/values. Without it PB is not that usable in pipeline settings.
@bilgeyucel let me try to answer but I'll leave it to @tstadel to confirm.
- don't have to use it - it is optional, no
variables-> no other components providing data (e.g. documents) to PB.- It is an additional feature, to me it doesn't seem like breaking, @tstadel will confirm
- We need to use
variableswhenever other components provide PromptBuilder with template variables data/values. Without it PB is not that usable in pipeline settings.
Almost:
If you pass template but not variables, input slots will be inferred from template as before (No breaking change!). So
- you don't have to use
variablesat all if you are good with the input slots inferred from template - if you don't pass
template, you have to passvariablesin order to use input slots in dynamic templates, there is no other way to define them template_variablesis optional, you'll never be forced to define them
Ok I think I understand what's going on. Let me explain and you tell me if I'm correct, followed by some thoughts: What's going on:
- Solution B that @tstadel suggests extends the PromptBuilder to do the following:
- Basically,
templatebecomes not only an initialization argument but also a runtime variable forPromptBuilder - When user 'overrides'
templateat .run(), they may also change prompt input variables (like document, query) - this is inferred? Can I just override it with whatever variable and run it?
What I am worried about:
- If I am correct that @vblagoje you're suggesting that the changed variables for the templates are not inferred, we have to provide
variablesseparately to the.run()correct? - This would be quite complex to explain to users imo. If there's any way to avoid making it so that
variablesof any kind have to be provided separately, I would suggest we do that.
Please educate me here though, maybe I'm misunderstanding something
- If no other components can provide data otherwise, then the
variablesparameter becomes a must in most pipelines such as RAG- If I can eliminate "template_variables", and pass
data={"prompt_builder": {"target_language": "Spanish"}}instead ofdata={"prompt_builder": {"template_variables": {"target_language": "Spanish"}}}, it's great. But the example code doesn't imply that.Here's my understanding of how to use a static prompt with PromptBuilder in a pipeline. @tstadel please confirm 🙏
Before
The current implementation of a RAG pipeline:
from haystack import Pipeline template = """ Given the following information, answer the question. Context: {% for document in documents %} {{ document.content }} {% endfor %} Question: {{query}} Answer: """ basic_rag_pipeline = Pipeline() basic_rag_pipeline.add_component("retriever", InMemoryBM25Retriever(document_store)) basic_rag_pipeline.add_component("prompt_builder", prompt_builder = PromptBuilder(template=template)) basic_rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-3.5-turbo")) basic_rag_pipeline.connect("retriever", "prompt_builder.documents") basic_rag_pipeline.connect("prompt_builder", "llm") query = "What does Rhodes Statue look like?" response = basic_rag_pipeline.run({ "retriever": {"query": query}, "prompt_builder": {"query": query} })After
With this PR, the updated pipeline will look like this:
from haystack import Pipeline template = """ Given the following information, answer the question. Context: {% for document in documents %} {{ document.content }} {% endfor %} Question: {{query}} Answer: """ prompt_builder = PromptBuilder(template=template, variables=["documents"]) basic_rag_pipeline = Pipeline() basic_rag_pipeline.add_component("retriever", InMemoryBM25Retriever(document_store)) basic_rag_pipeline.add_component("prompt_builder", prompt_builder) basic_rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-3.5-turbo")) basic_rag_pipeline.connect("retriever", "prompt_builder.documents") basic_rag_pipeline.connect("prompt_builder", "llm") query = "What does Rhodes Statue look like?" response = basic_rag_pipeline.run({ "retriever": {"query": query}, "prompt_builder": {"template_variables": {"query": query}} })1 - I added
variables=["documents"]to my PromptBuilder because I'll injectdocumentscoming from the retriever 2 - I added "template_variables" key as I run the pipeline
Fortunately no :-) It will work exactly as before.
Additionally, looking at the code, I also see some inconsistencies that we shouldn't have if we must have variables..
In initialization, we optionally provide variables (my understanding, this is for when we override the template yes?)
But then, in the run function, we need to provide template_variables? Wouldn't these 2 be the same thing?
Ok so:
- I can use the
PromptBuilderexactly the same as before without providing variables/template variables at all even if say a retriever is fowardingdocumentsto it in pipeline.connect() - I will have to provide variables if I'm overriding
template - One thing I just don't yet fully understand is when we would use
template_variablesvsvariablesand what the difference is (even if you say we don't need to usetemplate_variables@tstadel - thanks for the explanations!!! Really helps
No, we need variables so PB can accept data from other pipeline components. The other ones are provided by the user in run time
Ok I think I understand what's going on. Let me explain and you tell me if I'm correct, followed by some thoughts: What's going on:
- Solution B that @tstadel suggests extends the PromptBuilder to do the following:
- Basically,
templatebecomes not only an initialization argument but also a runtime variable forPromptBuilder- When user 'overrides'
templateat .run(), they may also change prompt input variables (like document, query) - this is inferred? Can I just override it with whatever variable and run it?What I am worried about:
- If I am correct that @vblagoje you're suggesting that the changed variables for the templates are not inferred, we have to provide
variablesseparately to the.run()correct?- This would be quite complex to explain to users imo. If there's any way to avoid making it so that
variablesof any kind have to be provided separately, I would suggest we do that.Please educate me here though, maybe I'm misunderstanding something
@TuanaCelik @bilgeyucel @vblagoje Ok here is an illustrative example that should help shed light on what's not obvious:
@bilgeyucel 's example
from haystack import Pipeline
template = """
Given the following information, answer the question.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{query}}
Answer:
"""
basic_rag_pipeline = Pipeline()
basic_rag_pipeline.add_component("retriever", InMemoryBM25Retriever(document_store))
basic_rag_pipeline.add_component("prompt_builder", prompt_builder = PromptBuilder(template=template))
basic_rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-3.5-turbo"))
basic_rag_pipeline.connect("retriever", "prompt_builder.documents")
basic_rag_pipeline.connect("prompt_builder", "llm")
query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
"retriever": {"query": query},
"prompt_builder": {"query": query}
})
Here the following input slots are inferred from template:
documentsquery
Now let's change template at runtime having the same variables:
fancy_template = """
This is a super fancy dynamic template:
Documents:
{% for document in documents %}
Document {{ document.id }}
{{ document.content }}
{% endfor %}
Question: {{query}}
Answer:
"""
query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
"retriever": {"query": query},
"prompt_builder": {"query": query, "template": fancy_template}
})
Then this will work seamlessly as we use the same input slots:
documentsquery
Now there are two more cases for dynamic templates: Case A) We use less input slots as during init:
query_only_template = """
Question: {{query}}
Answer:
"""
query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
"retriever": {"query": query},
"prompt_builder": {"query": query, "template": query_only_template}
})
This will also work seamlessly as all template variables (i.e. query) are covered by input slots.
Case B) We use more input slots as during init:
even_fancier_template = """
{{ header }}
Given the following information, answer the question.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{query}}
Answer:
"""
query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
"retriever": {"query": query},
"prompt_builder": {"query": query, "template": even_fancier_template}
})
Note that the passed template now requires:
documentsqueryheader
The first two are covered by input slots, but the third header is not. That means there is no way to pass header through pipeline params. There are two options to set header now:
Case B1)
Set header via template_variables:
even_fancier_template = """
{{ header }}
Given the following information, answer the question.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{query}}
Answer:
"""
query = "What does Rhodes Statue look like?"
header = "This is my header"
response = basic_rag_pipeline.run({
"retriever": {"query": query},
"prompt_builder": {"query": query, "template": even_fancier_template, "template_variables": {"header": header}}
})
Case B2)
Define header as input slot via variables at init:
from haystack import Pipeline
template = """
Given the following information, answer the question.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{query}}
Answer:
"""
basic_rag_pipeline = Pipeline()
basic_rag_pipeline.add_component("retriever", InMemoryBM25Retriever(document_store))
basic_rag_pipeline.add_component("prompt_builder", prompt_builder = PromptBuilder(template=template, variables=["query", "documents", "header"]))
basic_rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-3.5-turbo"))
basic_rag_pipeline.connect("retriever", "prompt_builder.documents")
basic_rag_pipeline.connect("prompt_builder", "llm")
even_fancier_template = """
{{ header }}
Given the following information, answer the question.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{query}}
Answer:
"""
query = "What does Rhodes Statue look like?"
headers = "This is my header"
response = basic_rag_pipeline.run({
"retriever": {"query": query},
"prompt_builder": {"query": query, "template": even_fancier_template, "header": header}
})
Note, that variables are set to:
documentsqueryheader
Hence, we can pass header to prompt_builder via pipeline.
No, we need variables so PB can accept data from other pipeline components. The other ones are provided by the user in run time
@vblagoje please don't forget that variables are being inferred from template if template is set, but variables is not.
Additionally, looking at the code, I also see some inconsistencies that we shouldn't have if we must have
variables.. In initialization, we optionally providevariables(my understanding, this is for when we override the template yes?) But then, in the run function, we need to providetemplate_variables? Wouldn't these 2 be the same thing?
@TuanaCelik
I wouldn't mix them up, as variables just define the variables that prompt builder instance expects to receive from the pipeline. template_variables on the other hand overwrite or extend pipeline provided variables by user defined values.
Maybe we can find a better name for template_variables here.
@vblagoje The new documentation / explanation approach would look like this. We start with https://docs.haystack.deepset.ai/docs/promptbuilder and keep it the same*. We add the following sections:
Changing the template at runtime (Prompt Engineering)
PromptBuilder allows you to switch the prompt template of an existing pipeline. Below's example builds on top of the existing pipeline of the previous section. The existing pipeline is invoked with a new prompt template:
documents = [
Document(content="Joe lives in Berlin", meta={"name": "doc1"}),
Document(content="Joe is a software engineer", meta={"name": "doc1"}),
]
new_template = """
You are a helpful assistant.
Given these documents, answer the question.
Documents:
{% for doc in documents %}
Document {{ loop.index }}:
Document name: {{ doc.meta['name'] }}
{{ doc.content }}
{% endfor %}
Question: {{ query }}
Answer:
"""
p.run({
"prompt_builder": {
"documents": documents,
"query": question,
"template": new_template,
},
})
If you want to use different variables during prompt engineering than in the default template, you can do so by setting PromptBuilder's variables init parameter accordingly.
Overwriting variables at runtime
In case you want to overwrite the values of variables, you can use template_variables during runtime as illustrated below:
language_template = """
You are a helpful assistant.
Given these documents, answer the question.
Documents:
{% for doc in documents %}
Document {{ loop.index }}:
Document name: {{ doc.meta['name'] }}
{{ doc.content }}
{% endfor %}
Question: {{ query }}
Please provide your answer in {{ answer_language | default('English') }}
Answer:
"""
p.run({
"prompt_builder": {
"documents": documents,
"query": question,
"template": language_template,
"template_variables": {"answer_language": "German"},
},
})
Note that language_template introduces variable answer_language which is not bound to any pipeline variable. If not set otherwise, it would evaluate to its default value 'English'. In this example we are overwriting its value to 'German'.
template_variables allows you to overwrite pipeline variables (such as documents) as well.
- = except for the already broken examples
Hey @vblagoje @tstadel this last message with the docs suggestions looks reasonable to me, and the idea is pretty easy to understand :) We can adjust the examples slightly to fit into the docs, and it would look good.
I also very much like this user-perspective driven documentation rather than what I first suggested. And would even merge this straight into main. But let's proceed forward with what we all agree.
Shall we use the above written user perspective description in class pydocs as well @tstadel ?
I also very much like this user-perspective driven documentation rather than what I first suggested. And would even merge this straight into main. But let's proceed forward with what we all agree.
Shall we use the above written user perspective description in class pydocs as well @tstadel ?
@vblagoje Yes, why not. I can update it.
@vblagoje pydocs have been updated.