complete `addon_params` with all prompt templates
Description
Today you can only control for very broad language and entity types.
This PR exposes all prompt templates keys by adding them to the addon_params. This allows for easy customization of extraction prompts.
This gives more fine-grained control over prompts, e.g. you could instantiate different objects with special addon_params for certain types of text with more suitable domain relevant few-shot examples.
It also opens the way to impose more structure, e.g. via an ontology or (causal) relations.
Related Issues
None
Changes Made
added the following keys (without DEFAULT_):
PROMPTS["DEFAULT_TUPLE_DELIMITER"]
PROMPTS["DEFAULT_RECORD_DELIMITER"]
PROMPTS["DEFAULT_COMPLETION_DELIMITER"]
PROMPTS["summarize_entity_descriptions"]
PROMPTS["entity_extraction_examples"]
PROMPTS["entity_extraction"]
PROMPTS["entity_continue_extraction"]
PROMPTS["entity_if_loop_extraction"]
PROMPTS["keywords_extraction_examples"]
PROMPTS["keywords_extraction"]
PROMPTS["mix_rag_response"]
PROMPTS["naive_rag_response"]
PROMPTS["similarity_check"]
Checklist
- [x] Changes tested locally
- [x] Code reviewed
- [x] Documentation updated (if necessary)
- [x] Unit tests added (if applicable)
Additional Notes
@danielaskdd Please review. I left it simple and just added to addon_params, but this could be also grouped into prompts.
@danielaskdd ready to merge if you want.
This is the intention. Once all prompt templates are exposed with this PR you could do this:
Directory structure:
my_docs/
└── books/
├── book1.txt
├── book2.txt
└── articles/
├── article1.txt
├── article2.txt
├── insert_prompt_template.json
my_queries/
└── articles/
└── query_prompt_template.json
async def initialize_rag(addon_params: Optional[dict] = None):
rag_kwargs = {
"working_dir": WORKING_DIR,
"llm_model_func": gpt_4o_mini_complete,
}
# Only add addon_params to kwargs if it's provided by the caller
# otherwise will overrid default_factory (should be fine still, default language is pulled from PROMPTS)
if addon_params is not None:
rag_kwargs["addon_params"] = addon_params
rag = LightRAG(**rag_kwargs)
await rag.initialize_storages()
await initialize_pipeline_status()
return rag
# create file based example
json.dump({
"entity_extraction_examples": ["device", "make", "model", "publication", "date"]
}, open('./my_docs/articles/insert_template_prompts.json', 'w'))
json.dump({
"rag_response": "System prompt specific to articles..."
}, open('./my_queries/articles/query_template_prompts.json', 'w'))
docs = {
"books": {
"file_paths": ["./books/book1.txt", "./books/book2.txt"],
"addon_params": {
"entity_extraction_examples": ["organization", "person", "location"],
},
"system_prompts": {
"rag_response": "KG mode system prompt specific to books...",
"naive_rag_response": "Naive mode system prompt specific to books...",
"mix_rag_response": "Mix mode system prompt specific to books...",
},
},
"articles": {
"file_paths": ["./articles/article1.txt", "./articles/article2.txt"],
"addon_params": json.load(open('./my_docs/articles/insert_template_prompts.json', 'r')),
"system_prompts": json.load(open('./my_queries/articles/query_template_prompts.json', 'r')),
},
}
def get_content(file_paths):
contents = []
for fp in file_paths:
with open(fp, "r", encoding="utf-8") as f:
contents.append(f.read())
return contents
# Insert differently per doc type
for doc_type, doc_info in docs.items():
file_paths = doc_info["file_paths"]
addon_params = doc_info["addon_params"]
# Initialize the RAG instance for each document type
print(f"Initializing RAG for {doc_type}")
rag = asyncio.run(initialize_rag(addon_params))
contents = get_content(file_paths)
rag.insert(contents, file_paths=file_paths)
# Perform hybrid search for specific to `books` type queries
print(
rag.query(
"What are the top themes in this story?",
param=QueryParam(mode="hybrid"),
system_prompt=docs["books"][
"system_prompts"
]["rag_response"], # Use the hybrid mode specific system prompt for books type data
)
)
Of course you could write convenience functions for the template handling, template checks (are the placeholder present etc.) or the correct query template associations (e.g.: for local,global,hybrid you could specify rag_response while for mix it could be mix_rag_response and for naive it could be naive_rag_response to keep it aligned to current prompts.py and pass any of them to system_prompt as illustrated in the last example)...
We could open a new PR for handling and checks to warn users if placeholders are missing. As any prompt would not fail for now, this could serve as an example illustration in the meantime. And we add the information to the README and an example?
- cleaned up and separated query from insert prompts d71ceb9
- added checks to prevent possible problems when customizing critical prompt templates 3d7b1df
- added exhaustive examples to illustrate usage 01aee34
This should address the core items. @danielaskdd PTAL when convenient.
Just in case I missed it? @danielaskdd do you still need anything to approve this PR?
Whats the status here?
The proposal here actually seems reasonable. The current design, as discussed in #1353, has some limitations instead, especially when we want to support dynamic prompt changes, e.g., through a UI. Using static, globally shared PROMPTS dictionaries introduces risks in concurrent environments, where one user's changes can unintentionally affect another's session.
Moving toward passing prompts explicitly, e.g., via a param, would solve this scoping issue and is easy to use, feels kinda natural.
@drahnreb could you clarify the reasoning behind the design you've used?
rag = asyncio.run(initialize_rag(addon_params)) <<< THIS ONE INITS THE CUSTOM PROMPT FOR INISERT
contents = get_content(file_paths)
rag.insert(contents, file_paths=file_paths)
rag.query(...)
vs. an alternative
rag = asyncio.run(initialize_rag())
contents = get_content(file_paths)
rag.insert(contents, file_paths=file_paths, prompts) <<< THIS ONE INSERTS WITH CUSTOM PROMPTS
Note: custom prompts are here smth like DEFAULT_LANGUAGE or entity_extraction or summarize_entity_descriptions. We could get rid of e.g. aquery_with_separate_keyword_extraction or addon_param: "language".
Why not leaving it in both scenarios close to the actual call as param or why not adding it to QueryParam?
Whats the status here?
I am waiting for maintainer @danielaskdd to review latest proposal.
The current design, as discussed in #1353, has some limitations instead, especially when we want to support dynamic prompt changes, e.g., through a UI. Using static, globally shared PROMPTS dictionaries introduces risks in concurrent environments, where one user's changes can unintentionally affect another's session.
Agree.
@drahnreb could you clarify the reasoning behind the two different designs you've used?
Sure:
**rag = asyncio.run(initialize_rag(addon_params))** contents = get_content(file_paths) rag.insert(contents, file_paths=file_paths)
addon_params are already available for certain prompt elements like e.g. the language.
So I just completed the scope.
rag.query( "What are the top themes in this story?", param=QueryParam(mode="hybrid"), **system_prompt=docs["books"]["system_prompts"]["rag_response"],** )
Again, the system_prompt argument was already present and does the exact same, I just showed how both could originate from similar json templates.
Why not leaving it in both scenarios close to the actual call as param or why not adding it to QueryParam?
We could certainly do this but it breaks a couple of things. And I wanted to avoid it, because you could do most like that without disadvantages other than UX. Again, waiting for Maintainer feedback.
Thanks for your valuable feedback! Would love to see this merged very soon.
regarding system_prompt, just noticed it myself after i sent it, changed my initial comment :P.
regarding
system_prompt, just noticed it myself after i sent it, changed my initial comment :P.
Since you changed your comment: I used to have a prompt param in .insert(), but that is a design choice with more consequences. Like I said, this is a PR with minimal changes but does mostly the job, if we were to refactor completely it should have a discussion and then another PR with more major changes in my opinion.
What is more preferable @danielaskdd ?
I also agree that this might be a more advanced solution, but to make it a complete solution you would need to edit the document upload UI & the query UI. Both of these locations should have a pop up tab where you can configure all stated PROMPT[""] possibilities or add a new PROMPT configuration file somewhere else. Right?
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
This pull request has been automatically closed because it has not had recent activity.