[Question]: Why can't the KG built through Python execution be seen in the LightRAG Server?
Do you need to ask a question?
- [x] I have searched the existing question and discussions and this question is not already answered.
- [x] I believe this is a legitimate question, not just a bug or feature request.
Your Question
Hello. I have successfully installed and started LightRAG and the Server, and I was also able to upload documents via the UI and parse out the KG through the pipeline. However, when I switched to another method (uploading documents via Python code), after the parsing was completed, I did not see the KG. My steps are as follows:
- I created a new project directory, created a
.envfile, and ranlightrag-serverin the project directory. - I ran the RAG initialization code in Python, and uploaded and parsed the documents (it took an hour to parse 10 documents). The code is as follows:
import asyncio
import nest_asyncio
# 在 Jupyter Notebook 中启用嵌套的事件循环
nest_asyncio.apply()
import os
import inspect
import logging
from lightrag import LightRAG, QueryParam
from lightrag.llm.ollama import ollama_model_complete, ollama_embed
from lightrag.utils import EmbeddingFunc
from lightrag.kg.shared_storage import initialize_pipeline_status
# 设置工作目录
WORKING_DIR = "/data/github/LightRAG/znkf"
# 配置日志
logging.basicConfig(format="%(levelname)s:%(message)s", level=logging.INFO)
# 初始化 RAG 实例
async def initialize_rag():
rag = LightRAG(
working_dir=WORKING_DIR,
llm_model_func=ollama_model_complete,
llm_model_name="qwen2.5:14b",
llm_model_max_async=2,
llm_model_max_token_size=32768,
llm_model_kwargs={
"host": "http://localhost:11434",
"options": {"num_ctx": 32768},
},
embedding_func=EmbeddingFunc(
embedding_dim=1024,
max_token_size=512,
func=lambda texts: ollama_embed(
texts, embed_model="quentinz/bge-large-zh-v1.5:latest", host="http://localhost:11434"
),
),
addon_params={
"language": "Simplified Chinese",
"entity_types": ["产品系列", "手机品牌", "服务类型", "公司", "支付方式", "产品名称", "城市名",
"电话号码", "日期", "卡片类型", "优惠政策", "金额"],
}
)
await rag.initialize_storages()
await initialize_pipeline_status()
return rag
# 初始化 RAG 实例
loop = asyncio.get_event_loop() # 获取当前事件循环
rag = loop.run_until_complete(initialize_rag())
# 初始构建KG
file_path = '/data/notebooks/knowledge/kg/kg_test'
file_list = []
for root, _, files in os.walk(file_path):
for file in files:
if any(file.endswith(ext) for ext in [".md", ".txt"]):
filex = os.path.abspath(os.path.join(root, file))
print(f">>> 文件 {filex} 处理中...")
file_list.append(filex)
with open(filex, "r", encoding="utf-8") as f:
rag.insert(f.read())
- I can see the generated documents in my project directory, but I don't see any results in the UI.
Additional Context
中文表述:
您好。我已经成功的安装并启动LightRAG以及Server,并且也成功的通过UI上传文档并通过pipeline解析出KG。但是当我换了一个方式(通过python代码进行文档上传),最终解析完成后却没有看到KG。 我的步骤是这样的:
- 我新建了项目目录,新建了.env,并且在项目目录下运行了
lightrag-server - 我在python中运行了rag初始化代码,并且上传和解析文档(一共10个文档解析了一个小时)。代码如下:
代码如上 - 我能从我的项目目录看到已经生成的文档,但是我在UI中没有看到任何结果。(截图如上)
It feels like the Python execution and UI operations are disconnected. The UI operation makes it very convenient to upload documents directly, but in reality, I need to customize many execution parameters. 有一种感觉,就像python执行和UI操作两种方式是割裂的。UI操作直接上传文档很方便,但是实际上我需要自定义很多执行参数。
Based on the execution time and GPU usage in Python, the execution appears to have been successful, but for some unknown reason, it did not synchronize successfully in the UI. 从Python执行时间还有GPU使用情况来看,执行是成功的,但是不知道为何没有在UI中成功同步。
I tried both insert() method and uploading file via document/file endpoint. On my lightrag-server, each document I uploaded is displayed correctly. So does the knowledge graph tab.
Im running lightrag-server v1.3.0/1.2.6
@Exploding-Soda I know how to proceed now.
I just need to move all the files generated after executing Python into the subdirectory rag_storage.
I will continue testing functions like rag.create_entity.
@Exploding-Soda When I upload documents separately using the endpoint (starting lightrag-server in the project directory /data/github/LightRAG/znkf), everything works fine, and the correct KG is constructed. 👍
However, when I use Python code to perform rag.insert (starting lightrag-server in the project directory /data/github/LightRAG/znkf and setting WORKING_DIR = "/data/github/LightRAG/znkf"), the parsing process completes normally. But all the parsed data is stored in /data/github/LightRAG/znkf, and there is no display on the UI interface.
After I move the parsed results to /data/github/LightRAG/znkf/rag_storage, the UI displays correctly, and the KG graph appears.
However, when I continue executing rag.create_entity, the UI still does not update in real time.
well just here's all I know and wanna share, I think inputs and rag_storage dir will affect what you see on UI. If you empty those two dirs, you empty the UI, so it can be insert() function doesnt really make changes to files under those two, but I'm not sure.
For my senario I use docuemnt/files endpoint most. So I probably can't be of much help on this one, hopefully you'll find an answer to your question