Issue: Error in create_base_entity_graph Step During Indexing
Description
Hi, First of all, thank you for this great project! I have been experimenting with it and encountered an issue that I hope you can help me with.
Description I cloned the repository and followed the instructions to upload and index a PDF file. The file was successfully uploaded and processed into chunks, but the process failed at the create_base_entity_graph step. Below are the details:
Reproduction steps
1. Uploaded a PDF file.
2. The file was converted to text and processed into 432 chunks.
3. The indexing process started and created base text units and extracted entities.
The process failed at the create_base_entity_graph step.
Screenshots
No response
Logs
Indexing [1/1]: ewfacve.pdf
=> Converting ewfacve.pdf to text
=> Converted ewfacve.pdf to text
=> [ewfacve.pdf] Processed 432 chunks
=> Finished indexing ewfacve.pdf
[GraphRAG] Creating index... This can take a long time.
Logging enabled at /app/ktem_app_data/user_data/files/graphrag/56bbcd91-b07f-4ca4-a904-cce71bdf4571/output/20240828-110528/reports/indexing-engine.log
π create_base_text_units
id ... n_tokens
0 b85337bedc1f1de271961c7251d46b5c ... 918
1 0b9e954415cd9c6376528587903a9a96 ... 748
2 d03cbffc5b626210553dd466e022d2ee ... 891
3 d28fa2ce915c137677eaaa2aabdaa304 ... 732
4 3a7d2941f7ba32940874d0f06f9f62f4 ... 1141
.. ... ... ...
132 54d5b9a2b71895341def789d88113ef4 ... 1200
133 53f5ed0fb3870cf8e59a8f55c1271f3a ... 398
134 ea40a2366f7ad18819cfa804827eb4d4 ... 1183
135 f1e686b8519e7d93ce6e461d9fa8d1a0 ... 83
136 9eac5a3d46b4b013fef67b5c78330d89 ... 886
[275 rows x 5 columns]
π create_base_extracted_entities
entity_graph
0 <graphml xmlns="http://graphml.graphdrawing.or...
π create_summarized_entities
entity_graph
0 <graphml xmlns="http://graphml.graphdrawing.or...
β create_base_entity_graph
None
β GraphRAG Indexer
βββ Loading Input (text) - 216 files loaded (0 filtered) ββββββββββββββββββββββββββββββββββββββββ 100% 0:00:00 0:00:00
βββ create_base_text_units
βββ create_base_extracted_entities
βββ create_summarized_entities
βββ create_base_entity_graphβ Errors occurred during the pipeline run, see logs for more details.
Browsers
Chrome
OS
Windows
Additional information
No response
I am also facing the same issue. If someone can share the working env file it would be good as I believe we also need to put graphRag config
Hi there is high chance that these env_variable is not properly set.
# settings for GraphRAG
GRAPHRAG_API_KEY=openai_key
GRAPHRAG_LLM_MODEL=gpt-4o-mini
GRAPHRAG_EMBEDDING_MODEL=text-embedding-3-small
It is in the .env file. https://stackoverflow.com/questions/48607302/using-env-files-to-set-environment-variables-in-windows You can try to follow these procedure.
Thank you for your answer. I have made the configurations I can use Rag normally, but I cannot use Graphrag. I guess I need to configure graphRag?
Yes. You need to setup the environment variable above. Due to one limitation of our current implementation these GraphRAG env var won't read automatically from .env. We will work on an easy way to setup GraphRAG parameter on the UI in the next release.
What should go in GRAPHRAG_API_KEY=openai_key? I should use the same Azure Open AI key that I am using for AzureOpenAI?
Got the same error in Mac. Setting up the environment variables solved the issue, using dotenv run -- python app.py
@Laksh-star what value did you provide in GRAPHRAG_API_KEY? I am using Azure OpenAI
I used openAI, so gave that key value for GRAPHRAG_API_KEY. Logically you should use Azure OpenAI. But the sample env file here gives this.
# settings for GraphRAG
GRAPHRAG_API_KEY=openai_key
GRAPHRAG_LLM_MODEL=gpt-4o-mini
GRAPHRAG_EMBEDDING_MODEL=text-embedding-3-small
you can try first with Azure.
I put AzureOpenAI key in place of GRAPHRAG_API_KEY=openai_key but it didn't worked
For Azure OpenAI, please follow https://microsoft.github.io/graphrag/posts/config/env_vars/ Gonna be a bit more complicated than normal OpenAI
Also use the command suggested here https://github.com/Cinnamon/kotaemon/issues/140#issuecomment-2315706967 to load from .env file at start up.
Or, alternately https://pypi.org/project/python-dotenv-run/
Hi - can we use a local / Ollama embedding model instead of using something requiring an API key?
@ryansh1x yes you can, anything that uses openai compatible API will work. You just need to add your embedding endpoint and then create a new file collection that uses that endpoint. Also, please consider making a separate issue so that others with the same question can refer to.
@ryansh1x yes you can, anything that uses openai compatible API will work. You just need to add your embedding endpoint and then create a new file collection that uses that endpoint. Also, please consider making a separate issue so that others with the same question can refer to.
will do, shortly - appreciate your input. Once i've gotten the new issue created, I'd appreciate a more noob-level pointer - if I can get the graph portion of this working locally, I'll be able to argue for swapping into this for the majority of my use case.
@ryansh1x sure thing, feel free to try it out and submit any issue/question. I'm not familiar with setting up the graph portion as I'm also a noob myself, but I'm sure other folks can help you out. Cheers !
Hi - can we use a local / Ollama embedding model instead of using something requiring an API key?
Note that GRAPHRAG with Ollama OpenAI endpoint is also a bit tricky. Please wait why we are streamlining this process to mainstream user.
Had similar issues with ollama and graphrag. Setting the graphrag version to 0.3.2 fixed the issues for me. I can now use ollama with graphrag. Edit this line: "RUN pip install graphrag future" to "RUN pip install graphrag=0.3.2 future" Rebuild the image.
Graphrag chat not work.
Had similar issues with ollama and graphrag. Setting the graphrag version to 0.3.2 fixed the issues for me. I can now use ollama with graphrag. Edit this line: "RUN pip install graphrag future" to "RUN pip install graphrag=0.3.2 future" Rebuild the image.
Could you please share with us what are your graphrag variables in the .env?
I have set the following but still not working for me:
# settings for GraphRAG
GRAPHRAG_API_KEY=openai_key
GRAPHRAG_LLM_MODEL=llama3:70b
GRAPHRAG_EMBEDDING_MODEL=nomic-embed-text
bonjour, j'ai aussi le mΓͺme problΓ©me, remplacer la ligne Β« RUN pip install graphrag future Β» en Β« RUN pip install graphrag=0.3.2 future Β» et recontruire l'image ne l'a pas resolue la mΓͺme erreur que celle de Tom1009840152. connaissez vous svp une solution?
Got the same error in Mac. Setting up the environment variables solved the issue, using
dotenv run -- python app.py
I encountered the same problem when using Macbook Pro M1, but the above method is very effective. The issue lies in the env parameters not being read correctly. I hope the next version can fix this bug. Also, I found that using different reasoning methods, some cannot read graphrag information.
Same problem here ...
Hi everyone, I recently started using kotaemon and encountered the same problem. Thanks to the suggestions from the comments above, I was able to solve it.
I installed it using Docker and accessed the container through an IDE or Docker Desktop. The .env file is located at /app/.env Set the config variables like this:
#settings for GraphRAG GRAPHRAG_API_KEY=openai_key GRAPHRAG_LLM_MODEL=gpt-4o-mini GRAPHRAG_EMBEDDING_MODEL=text-embedding-3-small
Then, run the command dotenv run -- python app.py to apply the configuration.
After that, GraphRAG worked normally! I hope this helps resolve the issue.
Hi everyone, I recently started using kotaemon and encountered the same problem. Thanks to the suggestions from the comments above, I was able to solve it.
I installed it using Docker and accessed the container through an IDE or Docker Desktop. The
.envfile is located at/app/.envSet the config variables like this:#settings for GraphRAG GRAPHRAG_API_KEY=openai_key GRAPHRAG_LLM_MODEL=gpt-4o-mini GRAPHRAG_EMBEDDING_MODEL=text-embedding-3-small
Then, run the command
dotenv run -- python app.pyto apply the configuration.After that, GraphRAG worked normally! I hope this helps resolve the issue.
Hi, thanks for putting this out there.
I followed the exact same steps except I had a problem where the app would just launch right away when I created the container rather than allowing me to write 'dotenv run -- python app.py' in the terminal. So I added it in the docker command like so:
- Installed using docker and mounted my .env file (similar to yours) using the following command
docker create
--name kotaemon_container
-v /mnt/c/Users/abhis/.env:/app/.env
-e GRADIO_SERVER_NAME=0.0.0.0
-e GRADIO_SERVER_PORT=7860
-p 7860:7860
ghcr.io/cinnamon/kotaemon:main-full
/bin/bash -c "dotenv run -- python app.py"
.env : GRAPHRAG_API_KEY=api_key GRAPHRAG_LLM_MODEL=gpt-4o-mini GRAPHRAG_EMBEDDING_MODEL=text-embedding-3-small
- Started the container
However, I still get the same β create_base_entity_graph issue. I've verified that my API key is indeed in the project's /app/.env file as well.
Any other suggestions or things you did?
Has there been any updates to this? I have the same issue as @abhishekkakolla above