LightRAG icon indicating copy to clipboard operation
LightRAG copied to clipboard

[Feature Request]: Query only a specific range of nodes and edges in data storage

Open Exploding-Soda opened this issue 8 months ago • 6 comments

Do you need to file a feature request?

  • [x] I have searched the existing feature request and this feature request is not already filed.
  • [x] I believe this is a legitimate feature request, not just a question or bug.

Feature Request Description

Hello, I am currently using Lightrag-server and Neo4j Community Edition. Considering that in simple usage scenarios, we might store all nodes in a single database, some of the entities may contain conflicting information. For example, I uploaded the two documents "The Indecision Overthinker.txt" and "The Ruthless Avenger.txt" provide completely different descriptions of the entity Hamlet. However, I may only want to reference files that contain specific content in their filenames, they can be "academic," "financial," or "technical.", etc.

Image

I checked the prompt sent to the LLMs during the Retrieve stage, which indeed includes various attributes of the entity (including the date and source filename). In my database, I have stored both documents related to Hamlet. I attempted to instruct the LLMs to extract knowledge only from "The Ruthless Avenger.txt", and in fact, it did so correctly.

Image

Image

I believe the following needs may arise in actual use: Sometimes, users may prefer to extract more recent knowledge. Sometimes, users may already know which documents or what range of documents they want to extract knowledge from.

Thoughts: Simply allow specifying a date to extract only knowledge from that point onward, or only knowledge within a certain time frame from the current date.

Semantic processing of filenames—embedding might help locate the correct nodes using keywords, but this would require first retrieving all unrepeated file_path values from storage. This approach doesn’t seem very reliable, but I believe by specifing file_name in prompt, current LightRAG already allows for some degree of node classification. Or perhaps allow the users to define custom tags for nodes when uploading files, though this may require more code modifications.

still these are some vague thoughts, I'm not sure if there are other reliable storage methods that can already classify nodes and work with vanilla LightRAG, I just think I might need these features. If anyone is interested, please provide more ideas

Additional Context

The Indecision Overthinker: "I'm Hamlet—a man who thinks too much and acts too little. I question everything, second-guess every move, and get lost in my own doubts. Even when I know what must be done, I hesitate, trapped in endless 'what ifs.' My mind is my greatest enemy, paralyzing me when action is needed most. I'm not weak—just too aware of consequences, too afraid of making the wrong choice."

The Ruthless Avenger: "I’m Hamlet—a man who doesn’t hesitate when justice demands blood. I play the fool to deceive my enemies, but when the moment comes, I strike without mercy. I’ve sent traitors to their deaths, manipulated friends, and embraced violence when necessary. My father’s ghost called for vengeance, and I delivered. If others think me cruel, so be it—betrayal deserves no pity."

Entities near Hamlet in KG Image

Simplely ask about Hamlet's character Image

Referencing documents with conflicting information can lead to a certain degree of misunderstanding Image

Exploding-Soda avatar Apr 03 '25 06:04 Exploding-Soda

Yes, I would also like to see the addition of the "metadata" tags to entities like "timestamp" or other relevant data. This should be appendable when uploading a document. This way people can retrieve specifically the information added before or after a certain date.

I would also like to add that it could be very powerfull for lightrag that when handeling a chunked up document or just documents that it gets a couple of retrieved nodes from the knowledge graph it is being stored in. This way it would know if information is duplicated or this way it could also better place between what entities relationships should belong.

An easy example is this: I created a big knowledge graph from information about cats. After uploading everything I saw that I now have entities called cats, cat, kitten, kittens and more words similar to cat and a piled up description of the cat entity.

It would be nice to see when the knowledge graph was being built that the entity-relationship information related to the chunk was added so the "entity_extraction" process goes better.

This would probably avoid duplicate entities or duplicate entity descriptions.

frederikhendrix avatar Apr 03 '25 13:04 frederikhendrix

+1 This would be useful for providing KG for user-by-user

Miyamura80 avatar Apr 06 '25 16:04 Miyamura80

+1 Filtering by the metadata in queries like AWS Bedrock KnowlageBase & OpenSearch is super cool to have. You can add parameters to not let llm mixup the information between documents by narrowing the context.

emgiezet avatar Apr 18 '25 12:04 emgiezet

+1

oliverkaiser avatar Apr 29 '25 15:04 oliverkaiser

+1

tanasecucliciu avatar May 08 '25 10:05 tanasecucliciu

+1

shaharzep avatar May 31 '25 13:05 shaharzep

+1

puritatemcordis avatar Jun 04 '25 13:06 puritatemcordis