[Bug]: Unable to local query using latest main branch with error "FileNotFoundError: Table entity_description_embeddings does not exist."

Open GuityOrange opened this issue 1 year ago • 0 comments

Do you need to file an issue?

[x] I have searched the existing issues and this bug is not already filed.
[x] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
[x] I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

I've been using GraphRAG for two weeks now, and I updated to the latest branch in order to solve the issue that LLMs returns faulty responses on non JSON mode. However, I'm unable to local query correctly now. Considering the impact of past operations, I even cloned a new copy of the code and started building from scratch, but I still encountered the same issue. poetry run poe index and global query works fine, but I get an error when running local query. The error :

poetry run poe query --root . --method local "What is the service track?"

Poe => python -m graphrag.query --root . --method local 'What is the service track?'

INFO: Reading settings from settings.yaml
INFO: Vector Store Args: {}
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/littleKitty/PycharmProjects/graphrag2/graphrag/query/__main__.py", line 83, in <module>
    run_local_search(
  File "/Users/littleKitty/PycharmProjects/graphrag2/graphrag/query/cli.py", line 162, in run_local_search
    description_embedding_store = __get_embedding_description_store(
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/littleKitty/PycharmProjects/graphrag2/graphrag/query/cli.py", line 75, in __get_embedding_description_store
    description_embedding_store.db_connection.open_table(
  File "/Users/littleKitty/Library/Caches/pypoetry/virtualenvs/graphrag-kyfzN3S0-py3.11/lib/python3.11/site-packages/lancedb/db.py", line 445, in open_table
    return LanceTable.open(self, name, index_cache_size=index_cache_size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/littleKitty/Library/Caches/pypoetry/virtualenvs/graphrag-kyfzN3S0-py3.11/lib/python3.11/site-packages/lancedb/table.py", line 937, in open
    raise FileNotFoundError(
FileNotFoundError: Table entity_description_embeddings does not exist. Please first call db.create_table(entity_description_embeddings, data)

Steps to reproduce

No response

Expected Behavior

No response

GraphRAG Config Used

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: gpt-4o-2024-05-13
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 4000
  # request_timeout: 180.0
  # api_base: https://<instance>.openai.azure.com
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made
  # temperature: 0 # temperature for sampling
  # top_p: 1 # top-p sampling
  # n: 1 # Number of completions to generate

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: text-embedding-ada-002
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional

Additional Information

GraphRAG Version: 24/8/2 main branch
Operating System: macos
Python Version: 3.11
Related Issues:

thanks for help

Aug 05 '24 10:08 GuityOrange