LightRAG icon indicating copy to clipboard operation
LightRAG copied to clipboard

[Question]: It seems no related chunks are retrieved while there does exist related chunks and nodes in the KG

Open liuyuchen-cz opened this issue 8 months ago • 2 comments

Do you need to ask a question?

  • [x] I have searched the existing question and discussions and this question is not already answered.
  • [x] I believe this is a legitimate question, not just a bug or feature request.

Your Question

Hi ! I run lightrag-server in Chinese mode and use Ollama Deepseek 32B and bge-m3 as models. After uploading the text file in the WebUI, the KG can be displayed although nodes in it are not exactly what i am expecting. I try to ask question based the text, but the answer is totally irrelevant to the text. How should I solve this problem, and could you help me with improving the KG quality. More nodes and precise relations are needed.

Additional Context

INFO: Started server process [2424958] INFO: Waiting for application startup. INFO: Process 2424958 initialized updated flags for namespace: [full_docs] INFO: Process 2424958 ready to initialize storage namespace: [full_docs] INFO: Process 2424958 KV load full_docs with 1 records INFO: Process 2424958 initialized updated flags for namespace: [text_chunks] INFO: Process 2424958 ready to initialize storage namespace: [text_chunks] INFO: Process 2424958 KV load text_chunks with 115 records INFO: Process 2424958 initialized updated flags for namespace: [entities] INFO: Process 2424958 initialized updated flags for namespace: [relationships] INFO: Process 2424958 initialized updated flags for namespace: [chunks] INFO: Process 2424958 initialized updated flags for namespace: [chunk_entity_relation] INFO: Process 2424958 initialized updated flags for namespace: [llm_response_cache] INFO: Process 2424958 ready to initialize storage namespace: [llm_response_cache] INFO: Process 2424958 KV load llm_response_cache with 2 records INFO: Process 2424958 initialized updated flags for namespace: [doc_status] INFO: Process 2424958 ready to initialize storage namespace: [doc_status] INFO: Process 2424958 doc status load doc_status with 1 records INFO: Process 2424958 Pipeline namespace initialized

Server is ready to accept connections! 🚀

INFO: Application startup complete. INFO: 100.103.105.101:43594 - "GET /auth-status HTTP/1.1" 200 INFO: 100.103.105.101:43593 - "GET /webui/logo.png HTTP/1.1" 304 INFO: 100.103.105.101:43593 - "GET /docs HTTP/1.1" 200 INFO: 100.103.105.101:43594 - "GET /openapi.json HTTP/1.1" 200 INFO: Process 2424958 KV writting 0 records to text_chunks INFO: Process 2424958 drop text_chunks INFO: Process 2424958 KV writting 0 records to full_docs INFO: Process 2424958 drop full_docs INFO:nano-vectordb:Init {'embedding_dim': 1024, 'metric': 'cosine', 'storage_file': '/home/liuyuchen/lightrag/rag_storage/vdb_entities.json'} 0 data INFO: Process 2424958 drop entities(file:/home/liuyuchen/lightrag/rag_storage/vdb_entities.json) INFO:nano-vectordb:Init {'embedding_dim': 1024, 'metric': 'cosine', 'storage_file': '/home/liuyuchen/lightrag/rag_storage/vdb_relationships.json'} 0 data INFO: Process 2424958 drop relationships(file:/home/liuyuchen/lightrag/rag_storage/vdb_relationships.json) INFO:nano-vectordb:Init {'embedding_dim': 1024, 'metric': 'cosine', 'storage_file': '/home/liuyuchen/lightrag/rag_storage/vdb_chunks.json'} 0 data INFO: Process 2424958 drop chunks(file:/home/liuyuchen/lightrag/rag_storage/vdb_chunks.json) INFO: Process 2424958 drop graph chunk_entity_relation (file:/home/liuyuchen/lightrag/rag_storage/graph_chunk_entity_relation.graphml) INFO: Process 2424958 doc status writting 0 records to doc_status INFO: Process 2424958 drop doc_status INFO: Successfully dropped JsonKVStorage INFO: Successfully dropped JsonKVStorage INFO: Successfully dropped NanoVectorDBStorage INFO: Successfully dropped NanoVectorDBStorage INFO: Successfully dropped NanoVectorDBStorage INFO: Successfully dropped NetworkXStorage INFO: Successfully dropped JsonDocStatusStorage INFO: 100.103.105.101:43594 - "DELETE /documents HTTP/1.1" 200 INFO: 100.103.105.101:43594 - "POST /documents/upload HTTP/1.1" 200 INFO: Process 2424958 doc status writting 1 records to doc_status INFO: Stored 1 new unique documents INFO: Successfully fetched and enqueued file: risk.txt INFO: Processing 1 document(s) in 1 batches INFO: Start processing batch 1 of 1. INFO: Process 2424958 doc status writting 1 records to doc_status INFO: == LLM cache == saving 19f004a0f7a06df956b4c978a3b11eee INFO: == LLM cache == saving d4cbac0ee64e2190f3cf942494edd6e7 INFO: == LLM cache == saving 53daf64e335e4b878c721d70d2c5141f INFO: == LLM cache == saving 06748f1e93c0949a4e4a8f07646d97ca INFO: == LLM cache == saving 79ead2eff7cea021589c150dc6e3a6df ERROR: Failed to extract entities and relationships ERROR: Failed to process document doc-d80dd417ab1d37a0fc7dde0559ca9bc1: INFO: Process 2424958 doc status writting 1 records to doc_status INFO: Process 2424958 KV writting 1 records to full_docs INFO: Process 2424958 KV writting 5 records to text_chunks INFO: Process 2424958 KV writting 7 records to llm_response_cache INFO: Writing graph with 0 nodes, 0 edges INFO: In memory DB persist to disk INFO: Completed batch 1 of 1. INFO: Document processing pipeline completed INFO: == LLM cache == saving f51ce619b3ab13eaa2d870b99399d13e INFO: Chk 1/5: extracted 1 Ent + 0 Rel (deduplicated) INFO: == LLM cache == saving 23ad0193f9bd070dd2dae66f5733294b INFO: Chk 2/5: extracted 16 Ent + 5 Rel (deduplicated) INFO: == LLM cache == saving a05a14838ce03e2cbf5ee853585c0214 INFO: Chk 3/5: extracted 2 Ent + 0 Rel (deduplicated) INFO: == LLM cache == saving 82ba1ed7f6b05aa0f7040bcb349d3dc0 INFO: Chk 4/5: extracted 4 Ent + 4 Rel (deduplicated) INFO: Process 2424958 KV writting 0 records to text_chunks INFO: Process 2424958 drop text_chunks INFO: Process 2424958 KV writting 0 records to full_docs INFO: Process 2424958 drop full_docs INFO:nano-vectordb:Init {'embedding_dim': 1024, 'metric': 'cosine', 'storage_file': '/home/liuyuchen/lightrag/rag_storage/vdb_entities.json'} 0 data INFO: Process 2424958 drop entities(file:/home/liuyuchen/lightrag/rag_storage/vdb_entities.json) INFO:nano-vectordb:Init {'embedding_dim': 1024, 'metric': 'cosine', 'storage_file': '/home/liuyuchen/lightrag/rag_storage/vdb_relationships.json'} 0 data INFO: Process 2424958 drop relationships(file:/home/liuyuchen/lightrag/rag_storage/vdb_relationships.json) INFO:nano-vectordb:Init {'embedding_dim': 1024, 'metric': 'cosine', 'storage_file': '/home/liuyuchen/lightrag/rag_storage/vdb_chunks.json'} 0 data INFO: Process 2424958 drop chunks(file:/home/liuyuchen/lightrag/rag_storage/vdb_chunks.json) INFO: Process 2424958 drop graph chunk_entity_relation (file:/home/liuyuchen/lightrag/rag_storage/graph_chunk_entity_relation.graphml) INFO: Process 2424958 doc status writting 0 records to doc_status INFO: Process 2424958 drop doc_status INFO: Successfully dropped JsonKVStorage INFO: Successfully dropped JsonKVStorage INFO: Successfully dropped NanoVectorDBStorage INFO: Successfully dropped NanoVectorDBStorage INFO: Successfully dropped NanoVectorDBStorage INFO: Successfully dropped NetworkXStorage INFO: Successfully dropped JsonDocStatusStorage INFO: 100.103.105.101:44473 - "DELETE /documents HTTP/1.1" 200 INFO: 100.103.105.101:44473 - "POST /documents/upload HTTP/1.1" 200 INFO: Process 2424958 doc status writting 1 records to doc_status INFO: Stored 1 new unique documents INFO: Successfully fetched and enqueued file: risk.txt INFO: Processing 1 document(s) in 1 batches INFO: Start processing batch 1 of 1. INFO: Process 2424958 doc status writting 1 records to doc_status INFO: Chk 1/5: extracted 1 Ent + 0 Rel (deduplicated) INFO: Chk 2/5: extracted 16 Ent + 5 Rel (deduplicated) INFO: Chk 3/5: extracted 2 Ent + 0 Rel (deduplicated) INFO: == LLM cache == saving 936108fa63451c2613572b8a02f86ca6 INFO: Chk 4/5: extracted 0 Ent + 0 Rel (deduplicated) INFO: == LLM cache == saving 82ba1ed7f6b05aa0f7040bcb349d3dc0 INFO: Chk 5/5: extracted 4 Ent + 4 Rel (deduplicated) INFO: Merge N: RO | 2+0 INFO: Extracted 22 entities + 9 relationships (total) INFO: Process 2424958 doc status writting 1 records to doc_status INFO: Process 2424958 KV writting 1 records to full_docs INFO: Process 2424958 KV writting 5 records to text_chunks INFO: Process 2424958 KV writting 12 records to llm_response_cache INFO: Writing graph with 22 nodes, 9 edges INFO: In memory DB persist to disk INFO: Completed batch 1 of 1. INFO: Document processing pipeline completed INFO: 100.103.105.101:46638 - "GET /graph/label/list HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 22 | Edge count: 9 INFO: 100.103.105.101:46638 - "GET /graphs?label=*&max_depth=3&max_nodes=1000 HTTP/1.1" 200 ERROR: JSON parsing error: Extra data: line 5 column 1 (char 123) WARNING: low_level_keywords and high_level_keywords is empty INFO: 100.103.105.101:2033 - "POST /query/stream HTTP/1.1" 200 INFO: Process 2424958 buidling query context... INFO: Query nodes: 风险回避, top_k: 10, cosine: 0.2 INFO: Local query uses 10 entites, 3 relations, 3 chunks INFO: 100.103.105.101:4292 - "POST /query/stream HTTP/1.1" 200 INFO: Process 2424958 KV writting 13 records to llm_response_cache INFO: 100.103.105.101:4333 - "POST /query/stream HTTP/1.1" 200 INFO: Process 2424958 buidling query context... INFO: Query nodes: 风险回避, top_k: 10, cosine: 0.2 INFO: Query edges: 关系, top_k: 10, cosine: 0.2 INFO: Global query uses 14 entites, 9 relations, 2 chunks INFO: Local query uses 10 entites, 3 relations, 3 chunks INFO: 100.103.105.101:4396 - "POST /query/stream HTTP/1.1" 200 ^CINFO: Shutting down INFO: Waiting for application shutdown. INFO: Application shutdown complete. INFO: Finished server process [2424958]

The query is "风险回避有什么策略". At first there are errors noting that "high level keywords and low level key words are empty" and I fill them with “关系“ and “风险回避” respectively.

The answer is :"好吧,我现在得想想风险回避的策略。这个题目听起来有点专业,不过我尽量理清楚。首先,风险回避应该是指在面对潜在风险时采取的一些措施来避免这些风险发生或者减少它们的影响吧?那具体的策略都有哪些呢?

我记得以前学过风险管理,里面提到过几种方法,比如风险转移、风险减轻、风险接受等。那风险回避应该是在这些里面的一种。不过问题问的是风险回避的具体策略,所以我要想有哪些策略属于风险回避的范畴。

首先,可能就是避免行动本身。比如说,如果一个项目有可能带来很大的风险,那么直接不做这个项目是不是一种风险回避?比如投资某个高风险的市场,为了避免潜在损失,选择不进入。这应该是最直接的风险回避方式了。

其次,改变目标或者范围,这样可以降低风险发生的可能性。比如,原本计划在三个月内完成一个复杂的产品开发,但考虑到时间压力大,容易出问题,那就把时间延长到五个月,这样风险可能会小一些。这种调整也是一种风险回避策略吧?

另外,使用替代方案也是一个方法。当某个计划有较高的风险时,可以寻找风险较低的替代方案来代替。比如在选择供应商的时候,如果原来的供应商可能有供应不稳定的风险,就换一个更有保障的供应商。

还有严格控制条件,确保在可控范围内进行操作。比如说,在进行化学实验的时候,严格遵守安全规程和使用防护设备,这样可以避免事故的发生。这也是一种风险回避策略。

另外,建立后备方案也是一种方式。比如在项目管理中,制定应急计划,当某些关键环节出现问题时,能够及时切换到备用方案,减少损失。不过这可能更偏向于风险应对而非完全回避,但有时候也能起到避免风险的作用。

还有分散资源和任务,通过分摊来降低每个部分的风险。比如投资的时候,不把所有资金投入一个高风险项目,而是分散到多个低风险的项目中去。这样即使某个项目出现问题,整体损失也不会太大。

再有就是依赖可靠的技术或方法,使用已经被验证过的技术来减少不确定性带来的风险。比如说,在软件开发中,采用成熟稳定的框架而不是尝试新的不稳定技术,这样可以避免因为新技术不成熟导致的风险。

另外,寻求专业意见也是一个好策略。在面对复杂问题时,咨询专家的意见可以帮助识别潜在风险,并找到更好的解决方案,从而规避风险的发生。

还有就是进行彻底的尽职调查,在做重大决策前,全面了解相关信息和可能的风险因素,确保做出的决定是基于充分的信息基础上的,这样可以避免因为信息不全而带来的风险。

最后一点,减少不确定性。通过更多的研究、分析和数据收集,降低对未知因素的依赖,从而更好地预测和控制潜在风险。

总结一下,风险回避的策略主要有:不做高风险的事情、调整目标或范围以降低风险、使用替代方案、严格控制条件、建立后备方案、分散资源、依赖可靠技术、寻求专业意见、进行尽职调查以及减少不确定性。这些都是用来主动避免风险发生的方法。

不过在实际应用中,有时候完全回避风险可能不太现实,因为某些项目虽然有风险,但也有很大的收益机会。这时候需要权衡利弊,选择最适合的策略。比如对于一些高回报但高风险的机会,可能更倾向于接受风险而不是完全避开。所以,具体情况具体分析才是关键。

另外,我还需要考虑这些策略在不同情境下的适用性。比如说,在商业决策中,避免进入一个竞争激烈的市场可能是合理的;而在个人健康方面,改变生活习惯来避免疾病风险也是一种策略。不同的领域可能会有不同的侧重点和方法。

总的来说,风险回避需要综合考虑各种因素,并结合具体情况选择合适的策略。这不仅需要对潜在的风险有足够的认识,还需要有灵活的思维和决策能力,才能在复杂的环境中做出明智的选择。

风险回避是指通过采取一系列措施来避免或减少潜在风险的发生及其影响。以下是一些常见的风险回避策略:

风险回避策略

避免行动本身

如果某个项目或行为存在高风险,直接放弃或不参与,以避免潜在的损失。

调整目标或范围

通过缩小目标规模或延长完成时间,降低风险发生的可能性。

使用替代方案

寻找风险较低的替代方法或资源,以减少风险暴露。

严格控制条件

在可控范围内操作,确保遵守安全规程和使用防护措施,避免事故发生。

建立后备方案

制定应急计划,在关键环节出现问题时及时切换到备用方案,减少损失。

分散资源和任务

将资源或任务分配到多个部分,降低每个部分的风险影响。

依赖可靠技术或方法

采用成熟稳定的解决方案,避免因新技术不成熟带来的风险。

寻求专业意见

咨询专家的意见,帮助识别潜在风险并找到更好的解决方案。

进行尽职调查

在做出重大决策前,全面了解相关信息和可能的风险因素。

减少不确定性

通过研究、分析和数据收集,降低对未知因素的依赖,更好地预测和控制风险。

总结

风险回避需要综合考虑潜在风险和具体情境,并选择最适合的策略。在某些情况下,完全避免风险可能不现实,因此需权衡利弊,做出明智决策。不同领域可能有不同的侧重点和方法,灵活性和决策能力是关键。" which are totally wrong because it has nothing to do with the given text file.

liuyuchen-cz avatar Apr 14 '25 03:04 liuyuchen-cz

INFO: Process 2424958 doc status writting 1 records to doc_status INFO: Chk 1/5: extracted 1 Ent + 0 Rel (deduplicated) INFO: Chk 2/5: extracted 16 Ent + 5 Rel (deduplicated) INFO: Chk 3/5: extracted 2 Ent + 0 Rel (deduplicated) INFO: == LLM cache == saving 936108fa63451c2613572b8a02f86ca6 INFO: Chk 4/5: extracted 0 Ent + 0 Rel (deduplicated) INFO: == LLM cache == saving 82ba1ed7f6b05aa0f7040bcb349d3dc0 INFO: Chk 5/5: extracted 4 Ent + 4 Rel (deduplicated) INFO: Merge N: RO | 2+0 INFO: Extracted 22 entities + 9 relationships (total) INFO: Process 2424958 doc status writting 1 records to doc_status INFO: Process 2424958 KV writting 1 records to full_docs INFO: Process 2424958 KV writting 5 records to text_chunks INFO: Process 2424958 KV writting 12 records to llm_response_cache INFO: Writing graph with 22 nodes, 9 edges

This tells me that a lot went wrong, or that your inputted chunks don't contain any information. Only 22 nodes and 9 edges is very little. From personal experience I have noticed that AI tends to perform better when the input text is of the same language as the (system/user) prompt text.

I once tried to upload documents in Dutch and the prompt was english and then most entities were what the translation of the dutch word ment in english.

Also, if you are capable of running a 32b model you should try to find the best 32b model. From benchmarks Gemma3 27b performs extremely well last I read.

If you know your documents are going to be only one specific language I would say change your prompt.py to the same language. Good Luck!

frederikhendrix avatar Apr 14 '25 06:04 frederikhendrix

@frederikhendrix Thanks for your valuable advise I will try to change the prompt language in prompt.py manually. I thought it would change the prompt language by setting SUMMARY_LANGUAGE as "Chinese", which I did but it seems nothing happens XD.

liuyuchen-cz avatar Apr 14 '25 08:04 liuyuchen-cz

Please try using the mixed mode with reranker enabled in the latest version. Query quality has significantly improved since v1.4.0.

danielaskdd avatar Jul 20 '25 07:07 danielaskdd