kotaemon icon indicating copy to clipboard operation
kotaemon copied to clipboard

[BUG] Fail to chat with GraphRAG

Open CinderZhang opened this issue 1 year ago • 9 comments

Description

Setting up quick upload event Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). User-id: None, can see public conversations: False User-id: 1, can see public conversations: True User-id: 1, can see public conversations: True Session reasoning type None Session LLM None Reasoning class <class 'ktem.reasoning.simple.FullQAPipeline'> Reasoning state {'app': {'regen': False}, 'pipeline': {}} Thinking ... Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x00000273B5140CA0>, FSPath=WindowsPath('R:/kotaemon-app/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x00000273B5140F40>, get_extra_table=False, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x00000273B734EB60>, system_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x00000273B734EF20>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x00000273B734D420>), mmr=False, rerankers=[CohereReranking(cohere_api_key='<COHERE_API_KEY>', model_name='rerank-multilingual-v2.0')], retrieval_mode='hybrid', top_k=10, user_id=1), GraphRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x00000273FB1E1F60>, FSPath=<theflow.base.unset_ object at 0x00000273FB1E1F60>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x00000273FB1E1F60>, VS=<theflow.base.unset_ object at 0x00000273FB1E1F60>, file_ids=['e6ae8d9e-2419-47bd-b6e2-3607d7f5ced2'], user_id=<theflow.base.unset_ object at 0x00000273FB1E1F60>)] searching in doc_ids [] Traceback (most recent call last): File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\queueing.py", line 575, in process_events response = await route_utils.call_process_api( File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\route_utils.py", line 276, in call_process_api output = await app.get_blocks().process_api( File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\blocks.py", line 1923, in process_api result = await self.call_function( File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\blocks.py", line 1520, in call_function prediction = await utils.async_iteration(iterator) File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 663, in async_iteration return await iterator.anext() File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 656, in anext return await anyio.to_thread.run_sync( File "R:\kotaemon-app\install_dir\env\lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "R:\kotaemon-app\install_dir\env\lib\site-packages\anyio_backends_asyncio.py", line 2441, in run_sync_in_worker_thread return await future File "R:\kotaemon-app\install_dir\env\lib\site-packages\anyio_backends_asyncio.py", line 943, in run result = context.run(func, *args) File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 639, in run_sync_iterator_async return next(iterator) File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 801, in gen_wrapper response = next(iterator) File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\pages\chat_init_.py", line 899, in chat_fn for response in pipeline.stream(chat_input, conversation_id, chat_history): File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\reasoning\simple.py", line 705, in stream docs, infos = self.retrieve(message, history) File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\reasoning\simple.py", line 503, in retrieve retriever_docs = retriever_node(text=query) File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\base.py", line 1097, in call raise e from None File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\base.py", line 1088, in call output = self.fl.exec(func, args, kwargs) File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\backends\base.py", line 151, in exec return run(*args, **kwargs) File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\middleware.py", line 144, in call raise e from None File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\middleware.py", line 141, in call _output = self.next_call(*args, **kwargs) File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\middleware.py", line 117, in call return self.next_call(*args, **kwargs) File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\base.py", line 1017, in _runx return self.run(*args, **kwargs) File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\index\file\graph\pipelines.py", line 345, in run context_builder = self._build_graph_search() File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\index\file\graph\pipelines.py", line 204, in _build_graph_search entity_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_TABLE}.parquet") File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\parquet.py", line 667, in read_parquet return impl.read( File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\parquet.py", line 267, in read path_or_handle, handles, filesystem = _get_path_or_handle( File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\parquet.py", line 140, in _get_path_or_handle handles = get_handle( File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\common.py", line 882, in get_handle handle = open(handle, ioargs.mode) FileNotFoundError: [Errno 2] No such file or directory: 'R:\kotaemon-app\ktem_app_data\user_data\files\graphrag\a8af56b7-550c-4f92-ba60-fcf2163838b7\output/create_final_nodes.parquet' User-id: 1, can see public conversations: True

Reproduction steps

1. Go to 'File->GraphRAG'
2. Click on 'Upload'
3. Ask anything in the chat 
4. See error

Screenshots

![DESCRIPTION](LINK.png)

Logs

No response

Browsers

No response

OS

No response

Additional information

I installed it with "....bat" on Windows system.

CinderZhang avatar Oct 20 '24 23:10 CinderZhang

Same problem, already in latest version on linux, using run_linux.sh to install , still facing Graphrag part not working issue

I try modifying the run_linux.sh part like below: `

if pip list 2>/dev/null | grep -q "kotaemon"; then

    **python -m pip install graphrag future** // new line

    echo "Requirements are already installed"

else

    ..........

` now gets error: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. gradio 4.39.0 requires aiofiles<24.0,>=22.0, but you have aiofiles 24.1.0 which is incompatible. kotaemon 0.7.0 requires tenacity<8.3,>=8.2.3, but you have tenacity 9.0.0 which is incompatible. langchain 0.2.15 requires tenacity!=8.4.0,<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible. langchain-community 0.2.11 requires tenacity!=8.4.0,<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible. langchain-core 0.2.41 requires tenacity!=8.4.0,<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible. llama-index-core 0.10.68.post1 requires tenacity!=8.4.0,<9.0.0,>=8.2.0, but you have tenacity 9.0.0 which is incompatible. llama-index-legacy 0.9.48.post3 requires tenacity<9.0.0,>=8.2.0, but you have tenacity 9.0.0 which is incompatible.

So it seems to be an env error, I have not yet seen into the pip list, which may find something

CaMi1le avatar Oct 21 '24 01:10 CaMi1le

Did you set the GraphRAG API key correctly as mentioned in https://github.com/Cinnamon/kotaemon#setup-graphrag?

taprosoft avatar Oct 21 '24 06:10 taprosoft

I got same error even though I have set the GraphRAG API key in the .env file.

same error persists with setting GraphRAG API key in .env file

ajayarunachalam avatar Oct 21 '24 15:10 ajayarunachalam

我也有相同的的错误 setting.yaml encoding_model: cl100k_base skip_workflows: [] llm: api_key: ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat api_base: http://127.0.0.1:11434/v1 model: llama3.1:8b model_supports_json: true # recommended if this is available for your model. request_timeout: 1800.0 concurrent_requests: 5 # the number of parallel inflight requests that may be made

parallelization: stagger: 0.3

num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings: async_mode: threaded # or asyncio llm: api_key: ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat api_base: http://127.0.0.1:11434/v1 model: nomic-embed-text type: openai_embedding # api_base: https://.openai.azure.com # api_version: 2024-02-15-preview # organization: <organization_id> # deployment_name: <azure_model_deployment_name>

chunks: size: 1200 overlap: 100 group_by_columns: [id] # by default, we don't allow chunks to cross documents

input: type: file # or blob file_type: text # or csv base_dir: "input" file_encoding: utf-8 file_pattern: ".*\.txt$"

file_pattern: ".*\.txt$"

cache: type: file # or blob base_dir: "cache"

storage: type: file # or blob base_dir: "output"

connection_string: <azure_blob_storage_connection_string>

container_name: <azure_blob_storage_container_name>

reporting: type: file # or console, blob base_dir: "output"

connection_string: <azure_blob_storage_connection_string>

container_name: <azure_blob_storage_container_name>

entity_extraction:

strategy: fully override the entity extraction strategy.

type: one of graph_intelligence, graph_intelligence_json and nltk

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

prompt: "prompts/entity_extraction.txt" entity_types: [organization,person,geo,event] max_gleanings: 1

summarize_descriptions:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

prompt: "prompts/summarize_descriptions.txt" max_length: 500

claim_extraction:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

enabled: true

prompt: "prompts/claim_extraction.txt" description: "Any claims or facts that could be relevant to information discovery." max_gleanings: 1

community_reports:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

prompt: "prompts/community_report.txt" max_length: 2000 max_input_length: 8000

cluster_graph: max_cluster_size: 10

embed_graph: enabled: true # if true, will generate node2vec embeddings for nodes

num_walks: 10

walk_length: 40

window_size: 2

iterations: 3

random_seed: 597832

umap: enabled: true # if true, will generate UMAP embeddings for nodes

snapshots: graphml: true raw_entities: true top_level_nodes: true

local_search:

text_unit_prop: 0.5

community_prop: 0.1

conversation_history_max_turns: 5

top_k_mapped_entities: 10

top_k_relationships: 10

llm_temperature: 0 # temperature for sampling

llm_top_p: 1 # top-p sampling

llm_n: 1 # Number of completions to generate

max_tokens: 12000

global_search:

llm_temperature: 0 # temperature for sampling

llm_top_p: 1 # top-p sampling

llm_n: 1 # Number of completions to generate

max_tokens: 12000

data_max_tokens: 12000

map_max_tokens: 1000

reduce_max_tokens: 2000

concurrency: 32

.env

settings for OpenAI

OPENAI_API_BASE=https://api.openai.com/v1

OPENAI_API_BASE=https://api.deepseek.com/v1

OPENAI_API_KEY=

OPENAI_CHAT_MODEL=gpt-3.5-turbo

OPENAI_EMBEDDINGS_MODEL=text-embedding-ada-002

settings for Azure OpenAI

AZURE_OPENAI_ENDPOINT=

AZURE_OPENAI_API_KEY=

OPENAI_API_VERSION=2024-02-15-preview

AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-35-turbo

AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=text-embedding-ada-002

settings for Cohere

COHERE_API_KEY=<COHERE_API_KEY>

settings for local models

LOCAL_MODEL=llama3.1:8b LOCAL_MODEL_EMBEDDINGS=nomic-embed-text

settings for GraphRAG

GRAPHRAG_API_KEY=<YOUR_OPENAI_KEY> GRAPHRAG_LLM_MODEL=llama3.1:8b GRAPHRAG_EMBEDDING_MODEL=nomic-embed-text

set to true if you want to use customized GraphRAG config file

USE_CUSTOMIZED_GRAPHRAG_SETTING=true

settings for Azure DI

AZURE_DI_ENDPOINT= AZURE_DI_CREDENTIAL=

settings for Adobe API

get free credential at https://acrobatservices.adobe.com/dc-integration-creation-app-cdn/main.html?api=pdf-extract-api

also install pip install "pdfservices-sdk@git+https://github.com/niallcm/pdfservices-python-sdk.git@bump-and-unfreeze-requirements"

PDF_SERVICES_CLIENT_ID= PDF_SERVICES_CLIENT_SECRET=

settings for PDF.js

PDFJS_VERSION_DIST="pdfjs-4.0.379-dist"

base) root@autodl-container-3c3348b04d-889a978b:~# ollama pull nomic-embed-text pulling manifest pulling 970aa74c0a90... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 274 MB
pulling c71d239df917... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 11 KB
pulling ce4a164fc046... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 17 B
pulling 31df23ea7daa... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 420 B
verifying sha256 digest writing manifest success

(base) root@autodl-container-3c3348b04d-889a978b:~/autodl-tmp/kotaemon_071# ollama pull llama3.1:8b pulling manifest pulling 8eeb52dfb3bb... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.7 GB
pulling 948af2743fc7... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.5 KB
pulling 0ba8f0e314b4... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 12 KB
pulling 56bb8bd477a5... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 96 B
pulling 1a4c3c319823... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 485 B
verifying sha256 digest writing manifest success

sunnf8888 avatar Oct 21 '24 15:10 sunnf8888

output/create_final_nodes.parquet' same error

sunnf8888 avatar Oct 23 '24 14:10 sunnf8888

FileNotFoundError: [Errno 2] No such file or directory: '/app/ktem_app_data/user_data/files/graphrag/d6d06e52-7acf-4ec6-b1f0-ec84b86fedaa/output/create_final_nodes.parquet'

same error

joreyolo avatar Oct 24 '24 00:10 joreyolo

我用的服务器是autodl上的服务器。不知道是否和这个有关。

sunnf8888 avatar Oct 24 '24 04:10 sunnf8888

我用的服务器是autodl上的服务器。不知道是否和这个有关。 FileNotFoundError: [Errno 2] No such file or directory: '/root/autodl-tmp/kotaemon_l/kotaemon/ktem_app_data/user_data/files/graphrag/2d1932f9-2623-406c-b72bnodes.parquet'

sunnf8888 avatar Oct 24 '24 05:10 sunnf8888