kotaemon Issue: Error in create_base_entity

Description

Hi, First of all, thank you for this great project! I have been experimenting with it and encountered an issue that I hope you can help me with.

Description I cloned the repository and followed the instructions to upload and index a PDF file. The file was successfully uploaded and processed into chunks, but the process failed at the create_base_entity_graph step. Below are the details:

Reproduction steps

1. Uploaded a PDF file.
2. The file was converted to text and processed into 432 chunks.
3. The indexing process started and created base text units and extracted entities.
The process failed at the create_base_entity_graph step.

Screenshots

No response

Logs

Indexing [1/1]: ewfacve.pdf
 => Converting ewfacve.pdf to text
 => Converted ewfacve.pdf to text
 => [ewfacve.pdf] Processed 432 chunks
 => Finished indexing ewfacve.pdf
[GraphRAG] Creating index... This can take a long time.
Logging enabled at /app/ktem_app_data/user_data/files/graphrag/56bbcd91-b07f-4ca4-a904-cce71bdf4571/output/20240828-110528/reports/indexing-engine.log

🚀 create_base_text_units

                                   id  ... n_tokens

0    b85337bedc1f1de271961c7251d46b5c  ...      918

1    0b9e954415cd9c6376528587903a9a96  ...      748

2    d03cbffc5b626210553dd466e022d2ee  ...      891

3    d28fa2ce915c137677eaaa2aabdaa304  ...      732

4    3a7d2941f7ba32940874d0f06f9f62f4  ...     1141

..                                ...  ...      ...

132  54d5b9a2b71895341def789d88113ef4  ...     1200

133  53f5ed0fb3870cf8e59a8f55c1271f3a  ...      398

134  ea40a2366f7ad18819cfa804827eb4d4  ...     1183

135  f1e686b8519e7d93ce6e461d9fa8d1a0  ...       83

136  9eac5a3d46b4b013fef67b5c78330d89  ...      886

[275 rows x 5 columns]

🚀 create_base_extracted_entities

                                        entity_graph

0  <graphml xmlns="http://graphml.graphdrawing.or...

🚀 create_summarized_entities

                                        entity_graph

0  <graphml xmlns="http://graphml.graphdrawing.or...

❌ create_base_entity_graph

None

⠋ GraphRAG Indexer 

├── Loading Input (text) - 216 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00

├── create_base_text_units

├── create_base_extracted_entities

├── create_summarized_entities

└── create_base_entity_graph❌ Errors occurred during the pipeline run, see logs for more details.

Browsers

Chrome

OS

Windows

Additional information

No response

Aug 28 '24 12:08 Tom1009840152

I am also facing the same issue. If someone can share the working env file it would be good as I believe we also need to put graphRag config

Aug 28 '24 12:08 mohdyasa

Hi there is high chance that these env_variable is not properly set.

# settings for GraphRAG
GRAPHRAG_API_KEY=openai_key
GRAPHRAG_LLM_MODEL=gpt-4o-mini
GRAPHRAG_EMBEDDING_MODEL=text-embedding-3-small

It is in the .env file. https://stackoverflow.com/questions/48607302/using-env-files-to-set-environment-variables-in-windows You can try to follow these procedure.

Aug 28 '24 12:08 taprosoft

Thank you for your answer. I have made the configurations I can use Rag normally, but I cannot use Graphrag. I guess I need to configure graphRag?

20240828205317

Aug 28 '24 12:08 Tom1009840152

Yes. You need to setup the environment variable above. Due to one limitation of our current implementation these GraphRAG env var won't read automatically from .env. We will work on an easy way to setup GraphRAG parameter on the UI in the next release.

Aug 28 '24 13:08 taprosoft

What should go in GRAPHRAG_API_KEY=openai_key? I should use the same Azure Open AI key that I am using for AzureOpenAI?

Aug 28 '24 15:08 mohdyasa

Got the same error in Mac. Setting up the environment variables solved the issue, using dotenv run -- python app.py

Aug 28 '24 15:08 Laksh-star

@Laksh-star what value did you provide in GRAPHRAG_API_KEY? I am using Azure OpenAI

Aug 28 '24 15:08 mohdyasa

I used openAI, so gave that key value for GRAPHRAG_API_KEY. Logically you should use Azure OpenAI. But the sample env file here gives this.

# settings for GraphRAG
GRAPHRAG_API_KEY=openai_key
GRAPHRAG_LLM_MODEL=gpt-4o-mini
GRAPHRAG_EMBEDDING_MODEL=text-embedding-3-small

you can try first with Azure.

Aug 28 '24 16:08 Laksh-star

I put AzureOpenAI key in place of GRAPHRAG_API_KEY=openai_key but it didn't worked

Aug 28 '24 16:08 mohdyasa

For Azure OpenAI, please follow https://microsoft.github.io/graphrag/posts/config/env_vars/ Gonna be a bit more complicated than normal OpenAI

Aug 28 '24 16:08 taprosoft

Also use the command suggested here https://github.com/Cinnamon/kotaemon/issues/140#issuecomment-2315706967 to load from .env file at start up.

Aug 28 '24 16:08 taprosoft

Or, alternately https://pypi.org/project/python-dotenv-run/

Aug 28 '24 16:08 taprosoft

Hi - can we use a local / Ollama embedding model instead of using something requiring an API key?

Aug 29 '24 13:08 ryansh1x

@ryansh1x yes you can, anything that uses openai compatible API will work. You just need to add your embedding endpoint and then create a new file collection that uses that endpoint. Also, please consider making a separate issue so that others with the same question can refer to.

Aug 29 '24 14:08 lone17

@ryansh1x yes you can, anything that uses openai compatible API will work. You just need to add your embedding endpoint and then create a new file collection that uses that endpoint. Also, please consider making a separate issue so that others with the same question can refer to.

will do, shortly - appreciate your input. Once i've gotten the new issue created, I'd appreciate a more noob-level pointer - if I can get the graph portion of this working locally, I'll be able to argue for swapping into this for the majority of my use case.

Aug 29 '24 14:08 ryansh1x

@ryansh1x sure thing, feel free to try it out and submit any issue/question. I'm not familiar with setting up the graph portion as I'm also a noob myself, but I'm sure other folks can help you out. Cheers !

Aug 29 '24 15:08 lone17

Hi - can we use a local / Ollama embedding model instead of using something requiring an API key?

Note that GRAPHRAG with Ollama OpenAI endpoint is also a bit tricky. Please wait why we are streamlining this process to mainstream user.

Aug 29 '24 15:08 taprosoft

Had similar issues with ollama and graphrag. Setting the graphrag version to 0.3.2 fixed the issues for me. I can now use ollama with graphrag. Edit this line: "RUN pip install graphrag future" to "RUN pip install graphrag=0.3.2 future" Rebuild the image.

Aug 30 '24 09:08 Neptoos

Graphrag chat not work.

Sep 01 '24 15:09 bookandlover

Had similar issues with ollama and graphrag. Setting the graphrag version to 0.3.2 fixed the issues for me. I can now use ollama with graphrag. Edit this line: "RUN pip install graphrag future" to "RUN pip install graphrag=0.3.2 future" Rebuild the image.

Could you please share with us what are your graphrag variables in the .env?

I have set the following but still not working for me:

# settings for GraphRAG
GRAPHRAG_API_KEY=openai_key
GRAPHRAG_LLM_MODEL=llama3:70b
GRAPHRAG_EMBEDDING_MODEL=nomic-embed-text

Sep 02 '24 15:09 ikros98

bonjour, j'ai aussi le même probléme, remplacer la ligne « RUN pip install graphrag future » en « RUN pip install graphrag=0.3.2 future » et recontruire l'image ne l'a pas resolue la même erreur que celle de Tom1009840152. connaissez vous svp une solution?

Sep 02 '24 19:09 TILKPROD

Got the same error in Mac. Setting up the environment variables solved the issue, using dotenv run -- python app.py

I encountered the same problem when using Macbook Pro M1, but the above method is very effective. The issue lies in the env parameters not being read correctly. I hope the next version can fix this bug. Also, I found that using different reasoning methods, some cannot read graphrag information.

Sep 04 '24 13:09 clark874

Same problem here ...

Oct 22 '24 16:10 bentonglove

Hi everyone, I recently started using kotaemon and encountered the same problem. Thanks to the suggestions from the comments above, I was able to solve it.

I installed it using Docker and accessed the container through an IDE or Docker Desktop. The .env file is located at /app/.env Set the config variables like this:

#settings for GraphRAG GRAPHRAG_API_KEY=openai_key GRAPHRAG_LLM_MODEL=gpt-4o-mini GRAPHRAG_EMBEDDING_MODEL=text-embedding-3-small

Then, run the command dotenv run -- python app.py to apply the configuration.

After that, GraphRAG worked normally! I hope this helps resolve the issue.

Oct 24 '24 10:10 TSteven415

Hi everyone, I recently started using kotaemon and encountered the same problem. Thanks to the suggestions from the comments above, I was able to solve it.

I installed it using Docker and accessed the container through an IDE or Docker Desktop. The .env file is located at /app/.env Set the config variables like this:

#settings for GraphRAG GRAPHRAG_API_KEY=openai_key GRAPHRAG_LLM_MODEL=gpt-4o-mini GRAPHRAG_EMBEDDING_MODEL=text-embedding-3-small

Then, run the command dotenv run -- python app.py to apply the configuration.

After that, GraphRAG worked normally! I hope this helps resolve the issue.

Hi, thanks for putting this out there.

I followed the exact same steps except I had a problem where the app would just launch right away when I created the container rather than allowing me to write 'dotenv run -- python app.py' in the terminal. So I added it in the docker command like so:

Installed using docker and mounted my .env file (similar to yours) using the following command docker create
--name kotaemon_container
-v /mnt/c/Users/abhis/.env:/app/.env
-e GRADIO_SERVER_NAME=0.0.0.0
-e GRADIO_SERVER_PORT=7860
-p 7860:7860
ghcr.io/cinnamon/kotaemon:main-full
/bin/bash -c "dotenv run -- python app.py"

.env : GRAPHRAG_API_KEY=api_key GRAPHRAG_LLM_MODEL=gpt-4o-mini GRAPHRAG_EMBEDDING_MODEL=text-embedding-3-small

Started the container

However, I still get the same ❌ create_base_entity_graph issue. I've verified that my API key is indeed in the project's /app/.env file as well.

Any other suggestions or things you did?

Oct 28 '24 06:10 kakolla

Has there been any updates to this? I have the same issue as @abhishekkakolla above

Nov 12 '24 02:11 atjain02

Issue: Error in create_base_entity_graph Step During Indexing

Description

Reproduction steps

Screenshots

Logs

Browsers

OS

Additional information