Description

Setting up quick upload event Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). User-id: None, can see public conversations: False User-id: 1, can see public conversations: True User-id: 1, can see public conversations: True Session reasoning type None Session LLM None Reasoning class <class 'ktem.reasoning.simple.FullQAPipeline'> Reasoning state {'app': {'regen': False}, 'pipeline': {}} Thinking ... Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x00000273B5140CA0>, FSPath=WindowsPath('R:/kotaemon-app/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x00000273B5140F40>, get_extra_table=False, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x00000273B734EB60>, system_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x00000273B734EF20>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x00000273B734D420>), mmr=False, rerankers=[CohereReranking(cohere_api_key='<COHERE_API_KEY>', model_name='rerank-multilingual-v2.0')], retrieval_mode='hybrid', top_k=10, user_id=1), GraphRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x00000273FB1E1F60>, FSPath=<theflow.base.unset_ object at 0x00000273FB1E1F60>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x00000273FB1E1F60>, VS=<theflow.base.unset_ object at 0x00000273FB1E1F60>, file_ids=['e6ae8d9e-2419-47bd-b6e2-3607d7f5ced2'], user_id=<theflow.base.unset_ object at 0x00000273FB1E1F60>)] searching in doc_ids [] Traceback (most recent call last): File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\queueing.py", line 575, in process_events response = await route_utils.call_process_api( File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\route_utils.py", line 276, in call_process_api output = await app.get_blocks().process_api( File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\blocks.py", line 1923, in process_api result = await self.call_function( File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\blocks.py", line 1520, in call_function prediction = await utils.async_iteration(iterator) File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 663, in async_iteration return await iterator.anext() File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 656, in anext return await anyio.to_thread.run_sync( File "R:\kotaemon-app\install_dir\env\lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "R:\kotaemon-app\install_dir\env\lib\site-packages\anyio_backends_asyncio.py", line 2441, in run_sync_in_worker_thread return await future File "R:\kotaemon-app\install_dir\env\lib\site-packages\anyio_backends_asyncio.py", line 943, in run result = context.run(func, *args) File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 639, in run_sync_iterator_async return next(iterator) File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 801, in gen_wrapper response = next(iterator) File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\pages\chat_init_.py", line 899, in chat_fn for response in pipeline.stream(chat_input, conversation_id, chat_history): File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\reasoning\simple.py", line 705, in stream docs, infos = self.retrieve(message, history) File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\reasoning\simple.py", line 503, in retrieve retriever_docs = retriever_node(text=query) File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\base.py", line 1097, in call raise e from None File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\base.py", line 1088, in call output = self.fl.exec(func, args, kwargs) File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\backends\base.py", line 151, in exec return run(*args, **kwargs) File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\middleware.py", line 144, in call raise e from None File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\middleware.py", line 141, in call _output = self.next_call(*args, **kwargs) File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\middleware.py", line 117, in call return self.next_call(*args, **kwargs) File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\base.py", line 1017, in _runx return self.run(*args, **kwargs) File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\index\file\graph\pipelines.py", line 345, in run context_builder = self._build_graph_search() File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\index\file\graph\pipelines.py", line 204, in _build_graph_search entity_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_TABLE}.parquet") File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\parquet.py", line 667, in read_parquet return impl.read( File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\parquet.py", line 267, in read path_or_handle, handles, filesystem = _get_path_or_handle( File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\parquet.py", line 140, in _get_path_or_handle handles = get_handle( File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\common.py", line 882, in get_handle handle = open(handle, ioargs.mode) FileNotFoundError: [Errno 2] No such file or directory: 'R:\kotaemon-app\ktem_app_data\user_data\files\graphrag\a8af56b7-550c-4f92-ba60-fcf2163838b7\output/create_final_nodes.parquet' User-id: 1, can see public conversations: True

Reproduction steps

1. Go to 'File->GraphRAG'
2. Click on 'Upload'
3. Ask anything in the chat 
4. See error

Screenshots

![DESCRIPTION](LINK.png)

Logs

No response

Browsers

No response

OS

No response

Additional information

I installed it with "....bat" on Windows system.

Oct 20 '24 23:10 CinderZhang

Same problem, already in latest version on linux, using run_linux.sh to install , still facing Graphrag part not working issue

I try modifying the run_linux.sh part like below: `

if pip list 2>/dev/null | grep -q "kotaemon"; then

    **python -m pip install graphrag future** // new line

    echo "Requirements are already installed"

else

    ..........

` now gets error: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. gradio 4.39.0 requires aiofiles<24.0,>=22.0, but you have aiofiles 24.1.0 which is incompatible. kotaemon 0.7.0 requires tenacity<8.3,>=8.2.3, but you have tenacity 9.0.0 which is incompatible. langchain 0.2.15 requires tenacity!=8.4.0,<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible. langchain-community 0.2.11 requires tenacity!=8.4.0,<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible. langchain-core 0.2.41 requires tenacity!=8.4.0,<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible. llama-index-core 0.10.68.post1 requires tenacity!=8.4.0,<9.0.0,>=8.2.0, but you have tenacity 9.0.0 which is incompatible. llama-index-legacy 0.9.48.post3 requires tenacity<9.0.0,>=8.2.0, but you have tenacity 9.0.0 which is incompatible.

So it seems to be an env error, I have not yet seen into the pip list, which may find something

Oct 21 '24 01:10 CaMi1le

Did you set the GraphRAG API key correctly as mentioned in https://github.com/Cinnamon/kotaemon#setup-graphrag?

Oct 21 '24 06:10 taprosoft

I got same error even though I have set the GraphRAG API key in the .env file.

Oct 21 '24 09:10 piyush-vaghela-solutelabs

same error persists with setting GraphRAG API key in .env file

Oct 21 '24 15:10 ajayarunachalam

我也有相同的的错误 setting.yaml encoding_model: cl100k_base skip_workflows: [] llm: api_key: ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat api_base: http://127.0.0.1:11434/v1 model: llama3.1:8b model_supports_json: true # recommended if this is available for your model. request_timeout: 1800.0 concurrent_requests: 5 # the number of parallel inflight requests that may be made

parallelization: stagger: 0.3

num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings: async_mode: threaded # or asyncio llm: api_key: ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat api_base: http://127.0.0.1:11434/v1 model: nomic-embed-text type: openai_embedding # api_base: https://.openai.azure.com # api_version: 2024-02-15-preview # organization: <organization_id> # deployment_name: <azure_model_deployment_name>

chunks: size: 1200 overlap: 100 group_by_columns: [id] # by default, we don't allow chunks to cross documents

input: type: file # or blob file_type: text # or csv base_dir: "input" file_encoding: utf-8 file_pattern: ".*\.txt$"

file_pattern: ".*\.txt$"

cache: type: file # or blob base_dir: "cache"

storage: type: file # or blob base_dir: "output"

connection_string: <azure_blob_storage_connection_string>

container_name: <azure_blob_storage_container_name>

reporting: type: file # or console, blob base_dir: "output"

connection_string: <azure_blob_storage_connection_string>

container_name: <azure_blob_storage_container_name>

entity_extraction:

strategy: fully override the entity extraction strategy.

type: one of graph_intelligence, graph_intelligence_json and nltk

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

prompt: "prompts/entity_extraction.txt" entity_types: [organization,person,geo,event] max_gleanings: 1

summarize_descriptions:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

prompt: "prompts/summarize_descriptions.txt" max_length: 500

claim_extraction:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

enabled: true

prompt: "prompts/claim_extraction.txt" description: "Any claims or facts that could be relevant to information discovery." max_gleanings: 1

community_reports:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

prompt: "prompts/community_report.txt" max_length: 2000 max_input_length: 8000

cluster_graph: max_cluster_size: 10

embed_graph: enabled: true # if true, will generate node2vec embeddings for nodes

num_walks: 10

walk_length: 40

window_size: 2

iterations: 3

random_seed: 597832

umap: enabled: true # if true, will generate UMAP embeddings for nodes

snapshots: graphml: true raw_entities: true top_level_nodes: true

local_search:

text_unit_prop: 0.5

community_prop: 0.1

conversation_history_max_turns: 5

top_k_mapped_entities: 10

top_k_relationships: 10

llm_temperature: 0 # temperature for sampling

llm_top_p: 1 # top-p sampling

llm_n: 1 # Number of completions to generate

max_tokens: 12000

global_search:

llm_temperature: 0 # temperature for sampling

llm_top_p: 1 # top-p sampling

llm_n: 1 # Number of completions to generate

max_tokens: 12000

data_max_tokens: 12000

map_max_tokens: 1000

reduce_max_tokens: 2000

concurrency: 32

.env

settings for OpenAI

OPENAI_API_BASE=https://api.openai.com/v1

OPENAI_API_BASE=https://api.deepseek.com/v1

OPENAI_API_KEY=

OPENAI_CHAT_MODEL=gpt-3.5-turbo

OPENAI_EMBEDDINGS_MODEL=text-embedding-ada-002

settings for Azure OpenAI

AZURE_OPENAI_ENDPOINT=

AZURE_OPENAI_API_KEY=

OPENAI_API_VERSION=2024-02-15-preview

AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-35-turbo

AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=text-embedding-ada-002

settings for Cohere

COHERE_API_KEY=<COHERE_API_KEY>

settings for local models

LOCAL_MODEL=llama3.1:8b LOCAL_MODEL_EMBEDDINGS=nomic-embed-text

settings for GraphRAG

GRAPHRAG_API_KEY=<YOUR_OPENAI_KEY> GRAPHRAG_LLM_MODEL=llama3.1:8b GRAPHRAG_EMBEDDING_MODEL=nomic-embed-text

set to true if you want to use customized GraphRAG config file

USE_CUSTOMIZED_GRAPHRAG_SETTING=true

settings for Azure DI

AZURE_DI_ENDPOINT= AZURE_DI_CREDENTIAL=

settings for Adobe API

get free credential at https://acrobatservices.adobe.com/dc-integration-creation-app-cdn/main.html?api=pdf-extract-api

also install pip install "pdfservices-sdk@git+https://github.com/niallcm/pdfservices-python-sdk.git@bump-and-unfreeze-requirements"

PDF_SERVICES_CLIENT_ID= PDF_SERVICES_CLIENT_SECRET=

settings for PDF.js

PDFJS_VERSION_DIST="pdfjs-4.0.379-dist"

base) root@autodl-container-3c3348b04d-889a978b:~# ollama pull nomic-embed-text pulling manifest pulling 970aa74c0a90... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 274 MB
pulling c71d239df917... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 11 KB
pulling ce4a164fc046... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 17 B
pulling 31df23ea7daa... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 420 B
verifying sha256 digest writing manifest success

(base) root@autodl-container-3c3348b04d-889a978b:~/autodl-tmp/kotaemon_071# ollama pull llama3.1:8b pulling manifest pulling 8eeb52dfb3bb... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.7 GB
pulling 948af2743fc7... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.5 KB
pulling 0ba8f0e314b4... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 12 KB
pulling 56bb8bd477a5... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 96 B
pulling 1a4c3c319823... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 485 B
verifying sha256 digest writing manifest success

Oct 21 '24 15:10 sunnf8888

output/create_final_nodes.parquet' same error

Oct 23 '24 14:10 sunnf8888

FileNotFoundError: [Errno 2] No such file or directory: '/app/ktem_app_data/user_data/files/graphrag/d6d06e52-7acf-4ec6-b1f0-ec84b86fedaa/output/create_final_nodes.parquet'

same error

Oct 24 '24 00:10 joreyolo

我用的服务器是autodl上的服务器。不知道是否和这个有关。

Oct 24 '24 04:10 sunnf8888

我用的服务器是autodl上的服务器。不知道是否和这个有关。 FileNotFoundError: [Errno 2] No such file or directory: '/root/autodl-tmp/kotaemon_l/kotaemon/ktem_app_data/user_data/files/graphrag/2d1932f9-2623-406c-b72bnodes.parquet'

Oct 24 '24 05:10 sunnf8888

[BUG] Fail to chat with GraphRAG

Description

Reproduction steps

Screenshots

Logs

Browsers

OS

Additional information

num_threads: 50 # the number of threads to use for parallel processing

file_pattern: ".*\.txt$"

connection_string: <azure_blob_storage_connection_string>

container_name: <azure_blob_storage_container_name>

connection_string: <azure_blob_storage_connection_string>

container_name: <azure_blob_storage_container_name>

strategy: fully override the entity extraction strategy.

type: one of graph_intelligence, graph_intelligence_json and nltk

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

enabled: true

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

num_walks: 10

walk_length: 40

window_size: 2

iterations: 3

random_seed: 597832

text_unit_prop: 0.5

community_prop: 0.1

conversation_history_max_turns: 5

top_k_mapped_entities: 10

top_k_relationships: 10

llm_temperature: 0 # temperature for sampling

llm_top_p: 1 # top-p sampling

llm_n: 1 # Number of completions to generate

max_tokens: 12000

llm_temperature: 0 # temperature for sampling

llm_top_p: 1 # top-p sampling

llm_n: 1 # Number of completions to generate

max_tokens: 12000

data_max_tokens: 12000

map_max_tokens: 1000

reduce_max_tokens: 2000

concurrency: 32

settings for OpenAI

OPENAI_API_BASE=https://api.openai.com/v1

OPENAI_API_BASE=https://api.deepseek.com/v1

OPENAI_API_KEY=

OPENAI_CHAT_MODEL=gpt-3.5-turbo

OPENAI_EMBEDDINGS_MODEL=text-embedding-ada-002

settings for Azure OpenAI

AZURE_OPENAI_ENDPOINT=

AZURE_OPENAI_API_KEY=

OPENAI_API_VERSION=2024-02-15-preview

AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-35-turbo

AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=text-embedding-ada-002

settings for Cohere

settings for local models

settings for GraphRAG

set to true if you want to use customized GraphRAG config file

settings for Azure DI

settings for Adobe API

get free credential at https://acrobatservices.adobe.com/dc-integration-creation-app-cdn/main.html?api=pdf-extract-api

also install pip install "pdfservices-sdk@git+https://github.com/niallcm/pdfservices-python-sdk.git@bump-and-unfreeze-requirements"

settings for PDF.js