LightRAG icon indicating copy to clipboard operation
LightRAG copied to clipboard

[Bug]: 🐛 Bug Report: QueryParam ids parameter not working in API Server Describe the bug

Open marcosmarf27 opened this issue 3 months ago • 11 comments

Do you need to file an issue?

  • [ ] I have searched the existing issues and this bug is not already filed.
  • [ ] I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

🐛 Bug Report: QueryParam ids parameter not working in API Server Describe the bug The /query/data endpoint documents an ids parameter in the OpenAPI spec, but when used, it throws: QueryParam.init() got an unexpected keyword argument 'ids' To Reproduce bashcurl -X 'POST'
'https://your-lightrag-server/query/data'
-H 'accept: application/json'
-H 'Content-Type: application/json'
-d '{ "query": "test query", "mode": "mix", "ids": ["doc-ef964adcce4a038d6d08a4ade4717dab"] }' Error Response json{ "detail": "QueryParam.init() got an unexpected keyword argument 'ids'" } Expected behavior The query should filter results to only include the specified document IDs. Evidence

OpenAPI spec documents ids parameter as valid Internal code uses query_param.ids (see issue #1950) QueryParam class definition shows ids: list[str] | None = None

Environment

LightRAG Server version: [current version] API endpoint: /query/data

Suggested Fix There appears to be a mismatch between the API layer and QueryParam initialization. The parameter mapping may need to be updated.

Steps to reproduce

No response

Expected Behavior

No response

LightRAG Config Used

Paste your config here

Logs and screenshots

No response

Additional Information

vv1.4.9rc1/0228

Image

marcosmarf27 avatar Sep 23 '25 17:09 marcosmarf27

As of version 1.4.9rc2, the data format for the /query/data endpoint has undergone changes. The former dynamic ID fields have been deprecated and removed, while a new references array/list has been added. Developers are advised to utilize chunk_id as the designated identifier moving forward.

danielaskdd avatar Sep 24 '25 20:09 danielaskdd

The id query parameter for filtering has been deprecated.

danielaskdd avatar Sep 24 '25 20:09 danielaskdd

With the removal of the ids parameter, we lost the ability to filter by specific documents. Filtering by chunk_id is too limited for practical use cases.

Direct question: How can we filter queries to return results from only one specific document in the new architecture?

Context: I need to analyze legal documents individually, but chunk_id doesn't allow this. Is there any alternative or will this feature be reintroduced?


marcosmarf27 avatar Sep 24 '25 20:09 marcosmarf27

I intend to send some processes in PDF, I want to talk individually with each one using query, will it not be possible?

marcosmarf27 avatar Sep 24 '25 20:09 marcosmarf27

Doc_ID filtering is not working as expected for Graph-based RAG. This is because entity and relationship descriptions are aggregations of information from multiple documents, making them inherently unfilterable by a single document ID.

danielaskdd avatar Sep 24 '25 20:09 danielaskdd

Proposed Workarounds:

  • Build separate knowledge bases, each tailored to distinct topics.
  • For specialized query requirements, temporarily construct dedicated knowledge bases. This process can be significantly accelerated by leveraging LLM caching.

danielaskdd avatar Sep 24 '25 20:09 danielaskdd

Oh, it broke my legs! It won't be useful for my use case, which is analyzing long lawsuits.

marcosmarf27 avatar Sep 24 '25 21:09 marcosmarf27

Utilize the standard chunk-based search for comprehensive queries requiring document-level filtering. LightRag, leveraging its graph-based RAG capabilities, is employed for precision-focused retrieval on specific document subsets. To optimize and streamline the querying process, an AI agent should be developed to intelligently process and route user queries.

danielaskdd avatar Sep 24 '25 22:09 danielaskdd

@danielaskdd looking at the commit a923d378dd8c9561780f6da0370ffbd3b9b61877, your comment specifically states that id based filtering only applies to chunks and is specific to postgressql storage and hence you were deprecating it.

In your comment above, however you state:

As of version 1.4.9rc2, the data format for the /query/data endpoint has undergone changes. The former dynamic ID fields have been deprecated and removed, while a new references array/list has been added. Developers are advised to utilize chunk_id as the designated identifier moving forward.

I get that entities etc from the graphs have multiple sources, but these can still be disregarded due to the existance of a single chunk_id in the source ids. So I think there may be 2 possible approaches:

  1. filter the results from the search by chunk id (and source_id for the entities) before sending the data to the llm ourselves

  2. Generate a lightrag instance with the subset of documents we care about for the specific query.

My preference is 2 for my use case, because i know ahead of time the documents a particular query cares about.

listentorick avatar Oct 04 '25 21:10 listentorick

Creating a document subset is good idea compared to filtering by document ID, but this feature will need to wait until LightRAG supports multi-workspace switching.

danielaskdd avatar Oct 05 '25 00:10 danielaskdd