kotaemon icon indicating copy to clipboard operation
kotaemon copied to clipboard

[BUG] - The error occurred while using the graphrag collection feature

Open yuqiao9 opened this issue 1 year ago • 0 comments

Description

The error occurred while using the graphrag collection feature

Reproduction steps

Upload files to graphrag and select the graphrag collection feature.

Screenshots

No response

Logs

Thinking ...
Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x7f3f078073a0>, FSPath=PosixPath('/app/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x7f3f07a40040>, get_extra_table=False, llm_scorer=None, mmr=False, rerankers=[CohereReranking(cohere_api_key='', model_name='rerank-multilingual-v2.0', use_key_from_ktem=True)], retrieval_mode='hybrid', top_k=10, user_id=1), GraphRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x7f3fcec92320>, FSPath=<theflow.base.unset_ object at 0x7f3fcec92320>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x7f3fcec92320>, VS=<theflow.base.unset_ object at 0x7f3fcec92320>, file_ids=['60513354-be19-42c6-a4fb-b65887c2bbe7'], user_id=<theflow.base.unset_ object at 0x7f3fcec92320>)]
searching in doc_ids []
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 575, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
    return await iterator.__anext__()
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 656, in __anext__
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2405, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 914, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 639, in run_sync_iterator_async
    return next(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 801, in gen_wrapper
    response = next(iterator)
  File "/app/libs/ktem/ktem/pages/chat/__init__.py", line 804, in chat_fn
    for response in pipeline.stream(chat_input, conversation_id, chat_history):
  File "/app/libs/ktem/ktem/reasoning/simple.py", line 655, in stream
    docs, infos = self.retrieve(message, history)
  File "/app/libs/ktem/ktem/reasoning/simple.py", line 483, in retrieve
    retriever_docs = retriever_node(text=query)
  File "/usr/local/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
    raise e from None
  File "/usr/local/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
    output = self.fl.exec(func, args, kwargs)
  File "/usr/local/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
    return run(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
    raise e from None
  File "/usr/local/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
    _output = self.next_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
    return self.next_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
    return self.run(*args, **kwargs)
  File "/app/libs/ktem/ktem/index/file/graph/pipelines.py", line 321, in run
    context_builder = self._build_graph_search()
  File "/app/libs/ktem/ktem/index/file/graph/pipelines.py", line 198, in _build_graph_search
    entity_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_TABLE}.parquet")
  File "/usr/local/lib/python3.10/site-packages/pandas/io/parquet.py", line 667, in read_parquet
    return impl.read(
  File "/usr/local/lib/python3.10/site-packages/pandas/io/parquet.py", line 267, in read
    path_or_handle, handles, filesystem = _get_path_or_handle(
  File "/usr/local/lib/python3.10/site-packages/pandas/io/parquet.py", line 140, in _get_path_or_handle
    handles = get_handle(
  File "/usr/local/lib/python3.10/site-packages/pandas/io/common.py", line 882, in get_handle
    handle = open(handle, ioargs.mode)
NotADirectoryError: [Errno 20] Not a directory: '/app/ktem_app_data/user_data/files/graphrag/15f966fc-a057-4bb7-b308-8a007cce8110/output/stats.json/artifacts/create_final_nodes.parquet'

Browsers

Microsoft Edge

OS

Linux

Additional information

No response

yuqiao9 avatar Sep 26 '24 02:09 yuqiao9