[Bug]: Failed to extract entities and relationships
Do you need to file an issue?
- [x] I have searched the existing issues and this bug is not already filed.
- [x] I believe this is a legitimate bug, not just a question or feature request.
Describe the bug
Failed to extract entities and relationships Failed to process document doc-b15c2eafcc86764e2b11ad628d818ff4: 'metadata'
Steps to reproduce
No response
Expected Behavior
No response
LightRAG Config Used
Paste your config here
Logs and screenshots
No response
Additional Information
- LightRAG Version:
- Operating System:
- Python Version:
- Related Issues:
What LightRAG version do you have? I had this error myself, but it seems to be fixed in the latest version.
Failed to extract entities and relationships Failed to process document doc-xxxxxx.....: 'p' I was using the cloned repo yesterday
I experienced the same problem with version 1.2.6. Specifically, the very first document always processed with no problem, while the following occurred with the subsequent documents. (I tried both with batch insertion and by inserting manually one after another)
Failed to extract entities and relationships
Failed to process document doc-bsf89sdb89sdb98sdh8s9hb: 'metadata'
So far, everything appears to work fine for me with newer versions:
- version 1.3.0 (pip install lightrag-hku)
- version 1.3.1 (git clone and then pip install -e . )
Hi, thanks for the clarification. can you please check and let me know the correct version of git tag/commit? I did a git describe just now after cloning and I got v1.3.0-143-g7cf6381.
Hi! I cloned it this morning
git describe --tags
v1.3.0-143-g7cf6381
pip list | grep lightrag
lightrag-hku 1.3.1
If it can be of any help, here's the code I used for the batch insertion (NB: jupyter nb):
import os
import asyncio
from lightrag import LightRAG, QueryParam
from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed
from lightrag.kg.shared_storage import initialize_pipeline_status
import time
from IPython.display import Markdown
from dotenv import load_dotenv
load_dotenv()
# ONLY FOR JUPYTER NOTEBOOK
import nest_asyncio
nest_asyncio.apply()
# Define the working directory (where the ragstore is saved)
WORKING_DIR = "./graph_store"
if not os.path.exists(WORKING_DIR):
os.mkdir(WORKING_DIR)
# Initialize the rag
async def initialize_rag():
rag = LightRAG(
working_dir=WORKING_DIR,
embedding_func=openai_embed,
llm_model_func=gpt_4o_mini_complete,
# llm_model_func=gpt_4o_complete
)
await rag.initialize_storages()
await initialize_pipeline_status()
return rag
rag = asyncio.run(initialize_rag())
# Define where the md files are located
md_in_folder = "./source_mds"
# Get sorted list of Markdown files
md_files = sorted([f for f in os.listdir(md_in_folder) if f.endswith(".md")])
# Define documents and file paths lists
documents = [open(os.path.join(md_in_folder, f), "r", encoding="utf-8").read() for f in md_files]
file_paths = [os.path.join(md_in_folder, f) for f in md_files]
# Insert all md files
rag.insert(documents, file_paths=file_paths)
Hey, thanks for this. I'm actually trying to change the prompt.py file with my own. And only when I'm using my own file, I'm getting the error I shared. I modified everything to look exactly like the original prompt.py file. Any idea how to resolve it?
Hi! I cloned it this morning
git describe --tags v1.3.0-143-g7cf6381
pip list | grep lightrag lightrag-hku 1.3.1 If it can be of any help, here's the code I used for the batch insertion (NB: jupyter nb):
import os import asyncio from lightrag import LightRAG, QueryParam from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed from lightrag.kg.shared_storage import initialize_pipeline_status import time from IPython.display import Markdown
from dotenv import load_dotenv load_dotenv()
ONLY FOR JUPYTER NOTEBOOK
import nest_asyncio nest_asyncio.apply()
Define the working directory (where the ragstore is saved)
WORKING_DIR = "./graph_store" if not os.path.exists(WORKING_DIR): os.mkdir(WORKING_DIR)
Initialize the rag
async def initialize_rag(): rag = LightRAG( working_dir=WORKING_DIR, embedding_func=openai_embed, llm_model_func=gpt_4o_mini_complete, # llm_model_func=gpt_4o_complete )
await rag.initialize_storages() await initialize_pipeline_status() return ragrag = asyncio.run(initialize_rag())
Define where the md files are located
md_in_folder = "./source_mds"
Get sorted list of Markdown files
md_files = sorted([f for f in os.listdir(md_in_folder) if f.endswith(".md")])
Define documents and file paths lists
documents = [open(os.path.join(md_in_folder, f), "r", encoding="utf-8").read() for f in md_files] file_paths = [os.path.join(md_in_folder, f) for f in md_files]
Insert all md files
rag.insert(documents, file_paths=file_paths)
May I ask you that If you could run the examples/lightrag_openai_compatible_demo.py with version 1.3.0,It remind me failed to extract entity and relationship
Hey, I identified the issue yesterday. I have a lot of examples with latex equations in my custom entity extraction examples. This caused the error where the script wastrying to replace the delimiter placeholders with the delimiter values and getting confused between latex brackets and the actual delimiter brackets. I switched from using this to creating custom_kg and it totally solved it. One more thing I did was to modify the operate.py file to manually replace those exact delimiter placeholders. Thanks
Hey, I identified the issue yesterday. I have a lot of examples with latex equations in my custom entity extraction examples. This caused the error where the script wastrying to replace the delimiter placeholders with the delimiter values and getting confused between latex brackets and the actual delimiter brackets. I switched from using this to creating custom_kg and it totally solved it. One more thing I did was to modify the operate.py file to manually replace those exact delimiter placeholders. Thanks
May I ask you that If you could run the examples/lightrag_openai_compatible_demo.py with version 1.3.0 and, the same markdown file I use to lightrag version::1.2.3 is OK, but it remind me failed to extract entity and relationship
Hey, thanks for this. I'm actually trying to change the prompt.py file with my own. And only when I'm using my own file, I'm getting the error I shared. I modified everything to look exactly like the original prompt.py file. Any idea how to resolve it?
same here, it seems the oringinal prompt.py is different from version 1.2.6,You could try to check the prompt.py one by one
same problem: ERROR: Failed to extract entities and relationships ERROR: Failed to process document doc-5738d6560412c1f29d64858ea4e80442:
git describe --tags v1.3.1-12-g2abc26b lightrag-hku 1.3.2
I tried with the same .env as in the https://github.com/HKUDS/LightRAG/tree/main/lightrag/api example, with ollama and ollama embeddings. I tested this also with the Christmas Carol book example, same error
Please verify if the issue is resolved with the latest version.