Do you need to file an issue?

[ ] I have searched the existing issues and this bug is not already filed.
[ ] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
[ ] I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the issue

Errors

❌ create_base_entity_graph None
⠙ GraphRAG Indexer ├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── create_base_text_units ├── create_base_extracted_entities ├── create_final_covariates ├── create_summarized_entities ├── join_text_units_to_covariate_ids └── create_base_entity_graph ❌ Errors occurred during the pipeline run, see logs for more details.

Steps to reproduce

logs

{"type": "error", "data": "Error Invoking LLM", "stack": "Traceback (most recent call last):\n File

GraphRAG Config Used

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_CHAT_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: ${GRAPHRAG_CHAT_MODEL}
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 2000
  # request_timeout: 180.0
  # api_base: https://<instance>.openai.azure.com
  api_base: ${GRAPHRAG_API_BASE}
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made
  # temperature: 0 # temperature for sampling
  # top_p: 1 # top-p sampling
  # n: 1 # Number of completions to generate

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_EMBEDDING_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: ${GRAPHRAG_EMBEDDING_MODEL}
    # api_base: https://<instance>.openai.azure.com
    api_base: ${GRAPHRAG_API_BASE}
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional



chunks:
  size: 1200
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents

input:
  type: file # or blob
  file_type: text # or csv
  # base_dir: "input"
  base_dir: ${GRAPHRAG_INPUT_DIR}
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  # base_dir: "cache"
  base_dir: ${GRAPHRAG_CACHE_DIR}
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  # base_dir: "output/${timestamp}/artifacts"
  # base_dir: "inputs/artifacts"
  base_dir: ${GRAPHRAG_STORAGE_DIR}
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  # base_dir: "inputs/reports"
  base_dir: ${GRAPHRAG_REPORTING_DIR}
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # prompt: "prompts/entity_extraction.txt"
  prompt: ${GRAPHRAG_ENTITY_EXTRACTION_PROMPT_FILE}
  entity_types: [organization,person,geo,event]
  max_gleanings: 1

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # prompt: "prompts/summarize_descriptions.txt"
  prompt: ${GRAPHRAG_SUMMARIZE_DESCRIPTIONS_PROMPT_FILE}
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # 开启协变量
  enabled: true
  # prompt: "prompts/claim_extraction.txt"
  prompt: ${GRAPHRAG_CLAIM_EXTRACTION_PROMPT_FILE}
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # prompt: "prompts/community_report.txt"
  prompt: ${GRAPHRAG_COMMUNITY_REPORT_PROMPT_FILE}
  max_length: 2000
  max_input_length: 8000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # llm_temperature: 0 # temperature for sampling
  # llm_top_p: 1 # top-p sampling
  # llm_n: 1 # Number of completions to generate
  # max_tokens: 12000

global_search:
  # llm_temperature: 0 # temperature for sampling
  # llm_top_p: 1 # top-p sampling
  # llm_n: 1 # Number of completions to generate
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Logs and screenshots

No response

Additional Information

GraphRAG Version:
Operating System:
Python Version:
Related Issues:

Sep 21 '24 13:09 wy371900521

What's going on here? Did any of the big guys fix it? Get back to me. Thank you so much

Sep 21 '24 13:09 wy371900521

I am having same issue as you. Models deployed to Azure OpenAI.

The error message indicates that the URL you are trying to request is missing the required 'http://' or 'https://' protocol prefix. This is causing an UnsupportedProtocol exception when making a request using the httpx library.

File "/opt/anaconda3/envs/graphragenv/lib/python3.10/site-packages/graphrag/llm/openai/openai_chat_llm.py", line 53, in _execute_llm\n completion = await self.client.chat.completions.create

Sep 23 '24 16:09 aleixlahozt

check this:

https://github.com/microsoft/graphrag/blob/main/v1-breaking-changes.md

Sep 23 '24 16:09 aleixlahozt

Have you solved your problem?

Sep 24 '24 01:09 yinzih

没哦

Sep 24 '24 05:09 wy371900521

Can you upload your indexing-engine.log so we can see more error details?

Oct 01 '24 22:10 natoverse

This issue has been marked stale due to inactivity after repo maintainer or community member responses that request more information or suggest a solution. It will be closed after five additional days.

Oct 09 '24 01:10 github-actions[bot]

This issue has been marked stale due to inactivity after repo maintainer or community member responses that request more information or suggest a solution. It will be closed after five additional days.

Oct 18 '24 01:10 github-actions[bot]

❌ create_base_entity_graph ❌ Errors occurred during the pipeline run, see logs for more details.

Do you need to file an issue?

Describe the issue

Errors

Steps to reproduce

logs

GraphRAG Config Used

Logs and screenshots

Additional Information