[BUG] Fail to chat with GraphRAG
Description
Setting up quick upload event Running on local URL: http://127.0.0.1:7860
To create a public link, set share=True in launch().
User-id: None, can see public conversations: False
User-id: 1, can see public conversations: True
User-id: 1, can see public conversations: True
Session reasoning type None
Session LLM None
Reasoning class <class 'ktem.reasoning.simple.FullQAPipeline'>
Reasoning state {'app': {'regen': False}, 'pipeline': {}}
Thinking ...
Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x00000273B5140CA0>, FSPath=WindowsPath('R:/kotaemon-app/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x00000273B5140F40>, get_extra_table=False, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x00000273B734EB60>, system_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x00000273B734EF20>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x00000273B734D420>), mmr=False, rerankers=[CohereReranking(cohere_api_key='<COHERE_API_KEY>', model_name='rerank-multilingual-v2.0')], retrieval_mode='hybrid', top_k=10, user_id=1), GraphRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x00000273FB1E1F60>, FSPath=<theflow.base.unset_ object at 0x00000273FB1E1F60>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x00000273FB1E1F60>, VS=<theflow.base.unset_ object at 0x00000273FB1E1F60>, file_ids=['e6ae8d9e-2419-47bd-b6e2-3607d7f5ced2'], user_id=<theflow.base.unset_ object at 0x00000273FB1E1F60>)]
searching in doc_ids []
Traceback (most recent call last):
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\queueing.py", line 575, in process_events
response = await route_utils.call_process_api(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\blocks.py", line 1923, in process_api
result = await self.call_function(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\blocks.py", line 1520, in call_function
prediction = await utils.async_iteration(iterator)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 663, in async_iteration
return await iterator.anext()
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 656, in anext
return await anyio.to_thread.run_sync(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\anyio_backends_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
File "R:\kotaemon-app\install_dir\env\lib\site-packages\anyio_backends_asyncio.py", line 943, in run
result = context.run(func, *args)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 639, in run_sync_iterator_async
return next(iterator)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 801, in gen_wrapper
response = next(iterator)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\pages\chat_init_.py", line 899, in chat_fn
for response in pipeline.stream(chat_input, conversation_id, chat_history):
File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\reasoning\simple.py", line 705, in stream
docs, infos = self.retrieve(message, history)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\reasoning\simple.py", line 503, in retrieve
retriever_docs = retriever_node(text=query)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\base.py", line 1097, in call
raise e from None
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\base.py", line 1088, in call
output = self.fl.exec(func, args, kwargs)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\backends\base.py", line 151, in exec
return run(*args, **kwargs)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\middleware.py", line 144, in call
raise e from None
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\middleware.py", line 141, in call
_output = self.next_call(*args, **kwargs)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\middleware.py", line 117, in call
return self.next_call(*args, **kwargs)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\base.py", line 1017, in _runx
return self.run(*args, **kwargs)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\index\file\graph\pipelines.py", line 345, in run
context_builder = self._build_graph_search()
File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\index\file\graph\pipelines.py", line 204, in _build_graph_search
entity_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_TABLE}.parquet")
File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\parquet.py", line 667, in read_parquet
return impl.read(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\parquet.py", line 267, in read
path_or_handle, handles, filesystem = _get_path_or_handle(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\parquet.py", line 140, in _get_path_or_handle
handles = get_handle(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\common.py", line 882, in get_handle
handle = open(handle, ioargs.mode)
FileNotFoundError: [Errno 2] No such file or directory: 'R:\kotaemon-app\ktem_app_data\user_data\files\graphrag\a8af56b7-550c-4f92-ba60-fcf2163838b7\output/create_final_nodes.parquet'
User-id: 1, can see public conversations: True
Reproduction steps
1. Go to 'File->GraphRAG'
2. Click on 'Upload'
3. Ask anything in the chat
4. See error
Screenshots

Logs
No response
Browsers
No response
OS
No response
Additional information
I installed it with "....bat" on Windows system.
Same problem, already in latest version on linux, using run_linux.sh to install , still facing Graphrag part not working issue
I try modifying the run_linux.sh part like below: `
if pip list 2>/dev/null | grep -q "kotaemon"; then
**python -m pip install graphrag future** // new line
echo "Requirements are already installed"
else
..........
` now gets error: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. gradio 4.39.0 requires aiofiles<24.0,>=22.0, but you have aiofiles 24.1.0 which is incompatible. kotaemon 0.7.0 requires tenacity<8.3,>=8.2.3, but you have tenacity 9.0.0 which is incompatible. langchain 0.2.15 requires tenacity!=8.4.0,<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible. langchain-community 0.2.11 requires tenacity!=8.4.0,<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible. langchain-core 0.2.41 requires tenacity!=8.4.0,<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible. llama-index-core 0.10.68.post1 requires tenacity!=8.4.0,<9.0.0,>=8.2.0, but you have tenacity 9.0.0 which is incompatible. llama-index-legacy 0.9.48.post3 requires tenacity<9.0.0,>=8.2.0, but you have tenacity 9.0.0 which is incompatible.
So it seems to be an env error, I have not yet seen into the pip list, which may find something
Did you set the GraphRAG API key correctly as mentioned in https://github.com/Cinnamon/kotaemon#setup-graphrag?
I got same error even though I have set the GraphRAG API key in the .env file.
same error persists with setting GraphRAG API key in .env file
我也有相同的的错误 setting.yaml encoding_model: cl100k_base skip_workflows: [] llm: api_key: ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat api_base: http://127.0.0.1:11434/v1 model: llama3.1:8b model_supports_json: true # recommended if this is available for your model. request_timeout: 1800.0 concurrent_requests: 5 # the number of parallel inflight requests that may be made
parallelization: stagger: 0.3
num_threads: 50 # the number of threads to use for parallel processing
async_mode: threaded # or asyncio
embeddings:
async_mode: threaded # or asyncio
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_chat # or azure_openai_chat
api_base: http://127.0.0.1:11434/v1
model: nomic-embed-text
type: openai_embedding
# api_base: https://
chunks: size: 1200 overlap: 100 group_by_columns: [id] # by default, we don't allow chunks to cross documents
input: type: file # or blob file_type: text # or csv base_dir: "input" file_encoding: utf-8 file_pattern: ".*\.txt$"
file_pattern: ".*\.txt$"
cache: type: file # or blob base_dir: "cache"
storage: type: file # or blob base_dir: "output"
connection_string: <azure_blob_storage_connection_string>
container_name: <azure_blob_storage_container_name>
reporting: type: file # or console, blob base_dir: "output"
connection_string: <azure_blob_storage_connection_string>
container_name: <azure_blob_storage_container_name>
entity_extraction:
strategy: fully override the entity extraction strategy.
type: one of graph_intelligence, graph_intelligence_json and nltk
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
prompt: "prompts/entity_extraction.txt" entity_types: [organization,person,geo,event] max_gleanings: 1
summarize_descriptions:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
prompt: "prompts/summarize_descriptions.txt" max_length: 500
claim_extraction:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
enabled: true
prompt: "prompts/claim_extraction.txt" description: "Any claims or facts that could be relevant to information discovery." max_gleanings: 1
community_reports:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
prompt: "prompts/community_report.txt" max_length: 2000 max_input_length: 8000
cluster_graph: max_cluster_size: 10
embed_graph: enabled: true # if true, will generate node2vec embeddings for nodes
num_walks: 10
walk_length: 40
window_size: 2
iterations: 3
random_seed: 597832
umap: enabled: true # if true, will generate UMAP embeddings for nodes
snapshots: graphml: true raw_entities: true top_level_nodes: true
local_search:
text_unit_prop: 0.5
community_prop: 0.1
conversation_history_max_turns: 5
top_k_mapped_entities: 10
top_k_relationships: 10
llm_temperature: 0 # temperature for sampling
llm_top_p: 1 # top-p sampling
llm_n: 1 # Number of completions to generate
max_tokens: 12000
global_search:
llm_temperature: 0 # temperature for sampling
llm_top_p: 1 # top-p sampling
llm_n: 1 # Number of completions to generate
max_tokens: 12000
data_max_tokens: 12000
map_max_tokens: 1000
reduce_max_tokens: 2000
concurrency: 32
.env
settings for OpenAI
OPENAI_API_BASE=https://api.openai.com/v1
OPENAI_API_BASE=https://api.deepseek.com/v1
OPENAI_API_KEY=
OPENAI_CHAT_MODEL=gpt-3.5-turbo
OPENAI_EMBEDDINGS_MODEL=text-embedding-ada-002
settings for Azure OpenAI
AZURE_OPENAI_ENDPOINT=
AZURE_OPENAI_API_KEY=
OPENAI_API_VERSION=2024-02-15-preview
AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-35-turbo
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=text-embedding-ada-002
settings for Cohere
COHERE_API_KEY=<COHERE_API_KEY>
settings for local models
LOCAL_MODEL=llama3.1:8b LOCAL_MODEL_EMBEDDINGS=nomic-embed-text
settings for GraphRAG
GRAPHRAG_API_KEY=<YOUR_OPENAI_KEY> GRAPHRAG_LLM_MODEL=llama3.1:8b GRAPHRAG_EMBEDDING_MODEL=nomic-embed-text
set to true if you want to use customized GraphRAG config file
USE_CUSTOMIZED_GRAPHRAG_SETTING=true
settings for Azure DI
AZURE_DI_ENDPOINT= AZURE_DI_CREDENTIAL=
settings for Adobe API
get free credential at https://acrobatservices.adobe.com/dc-integration-creation-app-cdn/main.html?api=pdf-extract-api
also install pip install "pdfservices-sdk@git+https://github.com/niallcm/pdfservices-python-sdk.git@bump-and-unfreeze-requirements"
PDF_SERVICES_CLIENT_ID= PDF_SERVICES_CLIENT_SECRET=
settings for PDF.js
PDFJS_VERSION_DIST="pdfjs-4.0.379-dist"
base) root@autodl-container-3c3348b04d-889a978b:~# ollama pull nomic-embed-text
pulling manifest
pulling 970aa74c0a90... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 274 MB
pulling c71d239df917... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 11 KB
pulling ce4a164fc046... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 17 B
pulling 31df23ea7daa... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 420 B
verifying sha256 digest
writing manifest
success
(base) root@autodl-container-3c3348b04d-889a978b:~/autodl-tmp/kotaemon_071# ollama pull llama3.1:8b
pulling manifest
pulling 8eeb52dfb3bb... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.7 GB
pulling 948af2743fc7... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.5 KB
pulling 0ba8f0e314b4... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 12 KB
pulling 56bb8bd477a5... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 96 B
pulling 1a4c3c319823... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 485 B
verifying sha256 digest
writing manifest
success
output/create_final_nodes.parquet' same error
FileNotFoundError: [Errno 2] No such file or directory: '/app/ktem_app_data/user_data/files/graphrag/d6d06e52-7acf-4ec6-b1f0-ec84b86fedaa/output/create_final_nodes.parquet'
same error
我用的服务器是autodl上的服务器。不知道是否和这个有关。
我用的服务器是autodl上的服务器。不知道是否和这个有关。 FileNotFoundError: [Errno 2] No such file or directory: '/root/autodl-tmp/kotaemon_l/kotaemon/ktem_app_data/user_data/files/graphrag/2d1932f9-2623-406c-b72bnodes.parquet'