LightRAG icon indicating copy to clipboard operation
LightRAG copied to clipboard

[Bug]: Failed to extract entities and relationships

Open FeHuynhVI opened this issue 8 months ago • 11 comments

Do you need to file an issue?

  • [x] I have searched the existing issues and this bug is not already filed.
  • [x] I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

Failed to extract entities and relationships Failed to process document doc-b15c2eafcc86764e2b11ad628d818ff4: 'metadata'

Steps to reproduce

No response

Expected Behavior

No response

LightRAG Config Used

Paste your config here

Logs and screenshots

No response

Additional Information

  • LightRAG Version:
  • Operating System:
  • Python Version:
  • Related Issues:

FeHuynhVI avatar Mar 28 '25 09:03 FeHuynhVI

What LightRAG version do you have? I had this error myself, but it seems to be fixed in the latest version.

JoramMillenaar avatar Mar 28 '25 17:03 JoramMillenaar

Failed to extract entities and relationships Failed to process document doc-xxxxxx.....: 'p' I was using the cloned repo yesterday

harshffy avatar Mar 29 '25 06:03 harshffy

I experienced the same problem with version 1.2.6. Specifically, the very first document always processed with no problem, while the following occurred with the subsequent documents. (I tried both with batch insertion and by inserting manually one after another)

Failed to extract entities and relationships
Failed to process document doc-bsf89sdb89sdb98sdh8s9hb: 'metadata'

So far, everything appears to work fine for me with newer versions:

  • version 1.3.0 (pip install lightrag-hku)
  • version 1.3.1 (git clone and then pip install -e . )

leonardocerliani avatar Mar 29 '25 06:03 leonardocerliani

Hi, thanks for the clarification. can you please check and let me know the correct version of git tag/commit? I did a git describe just now after cloning and I got v1.3.0-143-g7cf6381.

harshffy avatar Mar 29 '25 07:03 harshffy

Hi! I cloned it this morning

git describe --tags
v1.3.0-143-g7cf6381

pip list | grep lightrag
lightrag-hku            1.3.1

If it can be of any help, here's the code I used for the batch insertion (NB: jupyter nb):

import os
import asyncio
from lightrag import LightRAG, QueryParam
from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed
from lightrag.kg.shared_storage import initialize_pipeline_status
import time
from IPython.display import Markdown

from dotenv import load_dotenv
load_dotenv()

# ONLY FOR JUPYTER NOTEBOOK
import nest_asyncio
nest_asyncio.apply()

# Define the working directory (where the ragstore is saved)
WORKING_DIR = "./graph_store"
if not os.path.exists(WORKING_DIR):
    os.mkdir(WORKING_DIR)


# Initialize the rag
async def initialize_rag():
    rag = LightRAG(
        working_dir=WORKING_DIR,
        embedding_func=openai_embed,
        llm_model_func=gpt_4o_mini_complete,
        # llm_model_func=gpt_4o_complete
    )

    await rag.initialize_storages()
    await initialize_pipeline_status()

    return rag

rag = asyncio.run(initialize_rag())


# Define where the md files are located
md_in_folder = "./source_mds"

# Get sorted list of Markdown files
md_files = sorted([f for f in os.listdir(md_in_folder) if f.endswith(".md")])

# Define documents and file paths lists
documents = [open(os.path.join(md_in_folder, f), "r", encoding="utf-8").read() for f in md_files]
file_paths = [os.path.join(md_in_folder, f) for f in md_files]

# Insert all md files
rag.insert(documents, file_paths=file_paths)

leonardocerliani avatar Mar 29 '25 10:03 leonardocerliani

Hey, thanks for this. I'm actually trying to change the prompt.py file with my own. And only when I'm using my own file, I'm getting the error I shared. I modified everything to look exactly like the original prompt.py file. Any idea how to resolve it?

harshffy avatar Mar 29 '25 11:03 harshffy

Hi! I cloned it this morning

git describe --tags v1.3.0-143-g7cf6381

pip list | grep lightrag lightrag-hku 1.3.1 If it can be of any help, here's the code I used for the batch insertion (NB: jupyter nb):

import os import asyncio from lightrag import LightRAG, QueryParam from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed from lightrag.kg.shared_storage import initialize_pipeline_status import time from IPython.display import Markdown

from dotenv import load_dotenv load_dotenv()

ONLY FOR JUPYTER NOTEBOOK

import nest_asyncio nest_asyncio.apply()

Define the working directory (where the ragstore is saved)

WORKING_DIR = "./graph_store" if not os.path.exists(WORKING_DIR): os.mkdir(WORKING_DIR)

Initialize the rag

async def initialize_rag(): rag = LightRAG( working_dir=WORKING_DIR, embedding_func=openai_embed, llm_model_func=gpt_4o_mini_complete, # llm_model_func=gpt_4o_complete )

await rag.initialize_storages()
await initialize_pipeline_status()

return rag

rag = asyncio.run(initialize_rag())

Define where the md files are located

md_in_folder = "./source_mds"

Get sorted list of Markdown files

md_files = sorted([f for f in os.listdir(md_in_folder) if f.endswith(".md")])

Define documents and file paths lists

documents = [open(os.path.join(md_in_folder, f), "r", encoding="utf-8").read() for f in md_files] file_paths = [os.path.join(md_in_folder, f) for f in md_files]

Insert all md files

rag.insert(documents, file_paths=file_paths)

May I ask you that If you could run the examples/lightrag_openai_compatible_demo.py with version 1.3.0,It remind me failed to extract entity and relationship

Buzeg avatar Mar 30 '25 09:03 Buzeg

Hey, I identified the issue yesterday. I have a lot of examples with latex equations in my custom entity extraction examples. This caused the error where the script wastrying to replace the delimiter placeholders with the delimiter values and getting confused between latex brackets and the actual delimiter brackets. I switched from using this to creating custom_kg and it totally solved it. One more thing I did was to modify the operate.py file to manually replace those exact delimiter placeholders. Thanks

harshffy avatar Mar 30 '25 09:03 harshffy

Hey, I identified the issue yesterday. I have a lot of examples with latex equations in my custom entity extraction examples. This caused the error where the script wastrying to replace the delimiter placeholders with the delimiter values and getting confused between latex brackets and the actual delimiter brackets. I switched from using this to creating custom_kg and it totally solved it. One more thing I did was to modify the operate.py file to manually replace those exact delimiter placeholders. Thanks

May I ask you that If you could run the examples/lightrag_openai_compatible_demo.py with version 1.3.0 and, the same markdown file I use to lightrag version::1.2.3 is OK, but it remind me failed to extract entity and relationship

Buzeg avatar Mar 30 '25 10:03 Buzeg

Hey, thanks for this. I'm actually trying to change the prompt.py file with my own. And only when I'm using my own file, I'm getting the error I shared. I modified everything to look exactly like the original prompt.py file. Any idea how to resolve it?

same here, it seems the oringinal prompt.py is different from version 1.2.6,You could try to check the prompt.py one by one

Buzeg avatar Mar 31 '25 07:03 Buzeg

same problem: ERROR: Failed to extract entities and relationships ERROR: Failed to process document doc-5738d6560412c1f29d64858ea4e80442:

git describe --tags v1.3.1-12-g2abc26b lightrag-hku 1.3.2

I tried with the same .env as in the https://github.com/HKUDS/LightRAG/tree/main/lightrag/api example, with ollama and ollama embeddings. I tested this also with the Christmas Carol book example, same error

akupka avatar Apr 07 '25 15:04 akupka

Please verify if the issue is resolved with the latest version.

danielaskdd avatar Jul 20 '25 01:07 danielaskdd