[Bug]: 🐛 Bug Report: QueryParam ids parameter not working in API Server Describe the bug
Do you need to file an issue?
- [ ] I have searched the existing issues and this bug is not already filed.
- [ ] I believe this is a legitimate bug, not just a question or feature request.
Describe the bug
🐛 Bug Report: QueryParam ids parameter not working in API Server
Describe the bug
The /query/data endpoint documents an ids parameter in the OpenAPI spec, but when used, it throws: QueryParam.init() got an unexpected keyword argument 'ids'
To Reproduce
bashcurl -X 'POST'
'https://your-lightrag-server/query/data'
-H 'accept: application/json'
-H 'Content-Type: application/json'
-d '{
"query": "test query",
"mode": "mix",
"ids": ["doc-ef964adcce4a038d6d08a4ade4717dab"]
}'
Error Response
json{
"detail": "QueryParam.init() got an unexpected keyword argument 'ids'"
}
Expected behavior
The query should filter results to only include the specified document IDs.
Evidence
OpenAPI spec documents ids parameter as valid Internal code uses query_param.ids (see issue #1950) QueryParam class definition shows ids: list[str] | None = None
Environment
LightRAG Server version: [current version] API endpoint: /query/data
Suggested Fix There appears to be a mismatch between the API layer and QueryParam initialization. The parameter mapping may need to be updated.
Steps to reproduce
No response
Expected Behavior
No response
LightRAG Config Used
Paste your config here
Logs and screenshots
No response
Additional Information
vv1.4.9rc1/0228
As of version 1.4.9rc2, the data format for the /query/data endpoint has undergone changes. The former dynamic ID fields have been deprecated and removed, while a new references array/list has been added. Developers are advised to utilize chunk_id as the designated identifier moving forward.
The id query parameter for filtering has been deprecated.
With the removal of the ids parameter, we lost the ability to filter by specific documents. Filtering by chunk_id is too limited for practical use cases.
Direct question: How can we filter queries to return results from only one specific document in the new architecture?
Context: I need to analyze legal documents individually, but chunk_id doesn't allow this. Is there any alternative or will this feature be reintroduced?
I intend to send some processes in PDF, I want to talk individually with each one using query, will it not be possible?
Doc_ID filtering is not working as expected for Graph-based RAG. This is because entity and relationship descriptions are aggregations of information from multiple documents, making them inherently unfilterable by a single document ID.
Proposed Workarounds:
- Build separate knowledge bases, each tailored to distinct topics.
- For specialized query requirements, temporarily construct dedicated knowledge bases. This process can be significantly accelerated by leveraging LLM caching.
Oh, it broke my legs! It won't be useful for my use case, which is analyzing long lawsuits.
Utilize the standard chunk-based search for comprehensive queries requiring document-level filtering. LightRag, leveraging its graph-based RAG capabilities, is employed for precision-focused retrieval on specific document subsets. To optimize and streamline the querying process, an AI agent should be developed to intelligently process and route user queries.
@danielaskdd looking at the commit a923d378dd8c9561780f6da0370ffbd3b9b61877, your comment specifically states that id based filtering only applies to chunks and is specific to postgressql storage and hence you were deprecating it.
In your comment above, however you state:
As of version 1.4.9rc2, the data format for the /query/data endpoint has undergone changes. The former dynamic ID fields have been deprecated and removed, while a new references array/list has been added. Developers are advised to utilize chunk_id as the designated identifier moving forward.
I get that entities etc from the graphs have multiple sources, but these can still be disregarded due to the existance of a single chunk_id in the source ids. So I think there may be 2 possible approaches:
-
filter the results from the search by chunk id (and source_id for the entities) before sending the data to the llm ourselves
-
Generate a lightrag instance with the subset of documents we care about for the specific query.
My preference is 2 for my use case, because i know ahead of time the documents a particular query cares about.
Creating a document subset is good idea compared to filtering by document ID, but this feature will need to wait until LightRAG supports multi-workspace switching.