LightRAG complete `addon_params` with all prompt templates

Description

Today you can only control for very broad language and entity types.

This PR exposes all prompt templates keys by adding them to the addon_params. This allows for easy customization of extraction prompts. This gives more fine-grained control over prompts, e.g. you could instantiate different objects with special addon_params for certain types of text with more suitable domain relevant few-shot examples.

It also opens the way to impose more structure, e.g. via an ontology or (causal) relations.

Related Issues

None

Changes Made

added the following keys (without DEFAULT_):

PROMPTS["DEFAULT_TUPLE_DELIMITER"]
PROMPTS["DEFAULT_RECORD_DELIMITER"]
PROMPTS["DEFAULT_COMPLETION_DELIMITER"]

PROMPTS["summarize_entity_descriptions"]
PROMPTS["entity_extraction_examples"]
PROMPTS["entity_extraction"]
PROMPTS["entity_continue_extraction"]
PROMPTS["entity_if_loop_extraction"]
PROMPTS["keywords_extraction_examples"]
PROMPTS["keywords_extraction"]

PROMPTS["mix_rag_response"]
PROMPTS["naive_rag_response"]

PROMPTS["similarity_check"]

Checklist

[x] Changes tested locally
[x] Code reviewed
[x] Documentation updated (if necessary)
[x] Unit tests added (if applicable)

Additional Notes

@danielaskdd Please review. I left it simple and just added to addon_params, but this could be also grouped into prompts.

Apr 17 '25 21:04 drahnreb

@danielaskdd ready to merge if you want.

Apr 19 '25 10:04 drahnreb

This is the intention. Once all prompt templates are exposed with this PR you could do this:

Directory structure:

my_docs/
 └── books/
     ├── book1.txt
     ├── book2.txt
 └── articles/
     ├── article1.txt
     ├── article2.txt
     ├── insert_prompt_template.json
my_queries/
 └── articles/
     └── query_prompt_template.json

async def initialize_rag(addon_params: Optional[dict] = None):
    rag_kwargs = {
        "working_dir": WORKING_DIR,
        "llm_model_func": gpt_4o_mini_complete,
    }
    # Only add addon_params to kwargs if it's provided by the caller
    # otherwise will overrid default_factory (should be fine still, default language is pulled from PROMPTS)
    if addon_params is not None:
        rag_kwargs["addon_params"] = addon_params

    rag = LightRAG(**rag_kwargs)

    await rag.initialize_storages()
    await initialize_pipeline_status()

    return rag

# create file based example
json.dump({
    "entity_extraction_examples": ["device", "make", "model", "publication", "date"]
}, open('./my_docs/articles/insert_template_prompts.json', 'w'))
json.dump({
    "rag_response": "System prompt specific to articles..."
}, open('./my_queries/articles/query_template_prompts.json', 'w'))

docs = {
    "books": {
        "file_paths": ["./books/book1.txt", "./books/book2.txt"],
        "addon_params": {
            "entity_extraction_examples": ["organization", "person", "location"],
        },
        "system_prompts": {
            "rag_response": "KG mode system prompt specific to books...",
            "naive_rag_response": "Naive mode system prompt specific to books...",
            "mix_rag_response": "Mix mode system prompt specific to books...",
        },
    },
    "articles": {
        "file_paths": ["./articles/article1.txt", "./articles/article2.txt"],
        "addon_params": json.load(open('./my_docs/articles/insert_template_prompts.json', 'r')),
        "system_prompts": json.load(open('./my_queries/articles/query_template_prompts.json', 'r')),
    },
}

def get_content(file_paths):
    contents = []
    for fp in file_paths:
        with open(fp, "r", encoding="utf-8") as f:
            contents.append(f.read())
    return contents

# Insert differently per doc type
for doc_type, doc_info in docs.items():
    file_paths = doc_info["file_paths"]
    addon_params = doc_info["addon_params"]

    # Initialize the RAG instance for each document type
    print(f"Initializing RAG for {doc_type}")
    rag = asyncio.run(initialize_rag(addon_params))

    contents = get_content(file_paths)
    rag.insert(contents, file_paths=file_paths)

# Perform hybrid search for specific to `books` type queries
print(
    rag.query(
        "What are the top themes in this story?",
        param=QueryParam(mode="hybrid"),
        system_prompt=docs["books"][
            "system_prompts"
        ]["rag_response"],  # Use the hybrid mode specific system prompt for books type data
    )
)

Of course you could write convenience functions for the template handling, template checks (are the placeholder present etc.) or the correct query template associations (e.g.: for local,global,hybrid you could specify rag_response while for mix it could be mix_rag_response and for naive it could be naive_rag_response to keep it aligned to current prompts.py and pass any of them to system_prompt as illustrated in the last example)...

We could open a new PR for handling and checks to warn users if placeholders are missing. As any prompt would not fail for now, this could serve as an example illustration in the meantime. And we add the information to the README and an example?

Apr 20 '25 15:04 drahnreb

cleaned up and separated query from insert prompts d71ceb9
added checks to prevent possible problems when customizing critical prompt templates 3d7b1df
added exhaustive examples to illustrate usage 01aee34

This should address the core items. @danielaskdd PTAL when convenient.

Apr 21 '25 01:04 drahnreb

Just in case I missed it? @danielaskdd do you still need anything to approve this PR?

Apr 23 '25 16:04 drahnreb

Whats the status here?

The proposal here actually seems reasonable. The current design, as discussed in #1353, has some limitations instead, especially when we want to support dynamic prompt changes, e.g., through a UI. Using static, globally shared PROMPTS dictionaries introduces risks in concurrent environments, where one user's changes can unintentionally affect another's session.

Moving toward passing prompts explicitly, e.g., via a param, would solve this scoping issue and is easy to use, feels kinda natural.

@drahnreb could you clarify the reasoning behind the design you've used?

rag = asyncio.run(initialize_rag(addon_params)) <<< THIS ONE INITS THE CUSTOM PROMPT FOR INISERT
contents = get_content(file_paths)
rag.insert(contents, file_paths=file_paths) 

rag.query(...)

vs. an alternative

rag = asyncio.run(initialize_rag())
contents = get_content(file_paths)

rag.insert(contents, file_paths=file_paths, prompts) <<< THIS ONE INSERTS WITH CUSTOM PROMPTS

Note: custom prompts are here smth like DEFAULT_LANGUAGE or entity_extraction or summarize_entity_descriptions. We could get rid of e.g. aquery_with_separate_keyword_extraction or addon_param: "language".

Why not leaving it in both scenarios close to the actual call as param or why not adding it to QueryParam?

May 07 '25 15:05 reqyou

Whats the status here?

I am waiting for maintainer @danielaskdd to review latest proposal.

The current design, as discussed in #1353, has some limitations instead, especially when we want to support dynamic prompt changes, e.g., through a UI. Using static, globally shared PROMPTS dictionaries introduces risks in concurrent environments, where one user's changes can unintentionally affect another's session.

Agree.

@drahnreb could you clarify the reasoning behind the two different designs you've used?

Sure:

**rag = asyncio.run(initialize_rag(addon_params))**
contents = get_content(file_paths)
rag.insert(contents, file_paths=file_paths)

addon_params are already available for certain prompt elements like e.g. the language. So I just completed the scope.

rag.query(
    "What are the top themes in this story?",
    param=QueryParam(mode="hybrid"),
    **system_prompt=docs["books"]["system_prompts"]["rag_response"],**
)

Again, the system_prompt argument was already present and does the exact same, I just showed how both could originate from similar json templates.

Why not leaving it in both scenarios close to the actual call as param or why not adding it to QueryParam?

We could certainly do this but it breaks a couple of things. And I wanted to avoid it, because you could do most like that without disadvantages other than UX. Again, waiting for Maintainer feedback.

Thanks for your valuable feedback! Would love to see this merged very soon.

May 07 '25 15:05 drahnreb

regarding system_prompt, just noticed it myself after i sent it, changed my initial comment :P.

May 07 '25 15:05 reqyou

regarding system_prompt, just noticed it myself after i sent it, changed my initial comment :P.

Since you changed your comment: I used to have a prompt param in .insert(), but that is a design choice with more consequences. Like I said, this is a PR with minimal changes but does mostly the job, if we were to refactor completely it should have a discussion and then another PR with more major changes in my opinion.

What is more preferable @danielaskdd ?

May 09 '25 09:05 drahnreb

I also agree that this might be a more advanced solution, but to make it a complete solution you would need to edit the document upload UI & the query UI. Both of these locations should have a pop up tab where you can configure all stated PROMPT[""] possibilities or add a new PROMPT configuration file somewhere else. Right?

May 09 '25 14:05 frederikhendrix

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Aug 07 '25 22:08 github-actions[bot]

This pull request has been automatically closed because it has not had recent activity.

Aug 15 '25 22:08 github-actions[bot]