graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

[Bug]: Auto prompt tuning - ValueError: Single '}' encountered in format string

Open ashkan-software2 opened this issue 7 months ago • 5 comments

Do you need to file an issue?

  • [x] I have searched the existing issues and this bug is not already filed.
  • [x] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • [x] I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

Hello,

During auto prompt tuning, GraphRAG generates a knowledge graph output that has bugs:

Bug: knowledge graph is not valid, because the number of } is more than {.

Steps to reproduce

  1. Init graphrag
  2. provide some paragraphs from this PDF: https://kpmg.com/kpmg-us/content/dam/kpmg/frv/pdf/2024/handbook-revenue-recognition-1224.pdf
  3. run prompt tuning

You will see this error:

Traceback (most recent call last):
  File ".../pypoetry/virtualenvs/service-vector-embedding-6NKDQ0ig-py3.11/lib/python3.11/site-packages/graphrag/index/operations/extract_graph/graph_extractor.py", line 127, in __call__
    result = await self._process_document(text, prompt_variables)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../pypoetry/virtualenvs/service-vector-embedding-6NKDQ0ig-py3.11/lib/python3.11/site-packages/graphrag/index/operations/extract_graph/graph_extractor.py", line 156, in _process_document
    self._extraction_prompt.format(**{
ValueError: Single '}' encountered in format string

and when I look at the extract_graph.txt I see the issue. For example, see here (there are 15 { but there are 19 } - look at the extra } in advance}) for example)

extract_graph.txt

("entity"{tuple_delimiter}HOSTING SERVICE FEES{tuple_delimiter}cost types{tuple_delimiter}Fees for hosting services, charged at $100 per month, paid in advance})
{record_delimiter}
("entity"{tuple_delimiter}REMAINING TERM OF THE HOSTING ARRANGEMENT{tuple_delimiter}lease arrangements{tuple_delimiter}The duration left on the hosting arrangement from the go-live date, which is 5 years})
{record_delimiter}
("entity"{tuple_delimiter}GO-LIVE DATE{tuple_delimiter}implementation details{tuple_delimiter}The date when the cloud-based solution became operational, which is January 1, Year 3})
{record_delimiter}
("entity"{tuple_delimiter}CAPITALIZED IMPLEMENTATION COSTS – PAYROLL MODULE{tuple_delimiter}cost types{tuple_delimiter}The costs incurred to implement the payroll processing module, amounting to $400, which are capitalized})

Expected Behavior

The extract_graph.txt should have equal number of { and } and free of errors

GraphRAG Config Used

models:
  default_chat_model:
    type: openai_chat
    auth_type: api_key
    api_key: ${GRAPHRAG_API_KEY}
    model: gpt-4-turbo-preview
    model_supports_json: true
    concurrent_requests: 25
    async_mode: threaded
    retry_strategy: native
    max_retries: -1
    tokens_per_minute: 0
    requests_per_minute: 0
  default_embedding_model:
    type: openai_embedding
    auth_type: api_key
    api_key: ${GRAPHRAG_API_KEY}
    model: text-embedding-3-small
    model_supports_json: true
    concurrent_requests: 25
    async_mode: threaded
    retry_strategy: native
    max_retries: -1
    tokens_per_minute: 0
    requests_per_minute: 0
vector_store:
  default_vector_store:
    type: lancedb
    db_uri: output/lancedb
    container_name: default
    overwrite: true
embed_text:
  model_id: default_embedding_model
  vector_store_id: default_vector_store
input:
  type: file
  file_type: json
  base_dir: input
  text_column: page_content
  title_column: title
  metadata:
  - page
  - data_type
  - figures
chunks:
  size: 1200
  overlap: 100
  group_by_columns:
  - id
cache:
  type: file
  base_dir: cache
reporting:
  type: file
  base_dir: logs
output:
  type: file
  base_dir: output
extract_graph:
  model_id: default_chat_model
  prompt: prompts/extract_graph.txt
  entity_types:
  - organization
  - trademark
  - publication
  - standard
  max_gleanings: 1
summarize_descriptions:
  model_id: default_chat_model
  prompt: prompts/summarize_descriptions.txt
  max_length: 500
extract_graph_nlp:
  text_analyzer:
    extractor_type: regex_english
extract_claims:
  enabled: false
  model_id: default_chat_model
  prompt: prompts/extract_claims.txt
  description: Any claims or facts that could be relevant to information discovery.
  max_gleanings: 1
community_reports:
  model_id: default_chat_model
  graph_prompt: prompts/community_report_graph.txt
  text_prompt: prompts/community_report_text.txt
  max_length: 2000
  max_input_length: 8000
cluster_graph:
  max_cluster_size: 10
embed_graph:
  enabled: false
umap:
  enabled: false
snapshots:
  graphml: false
  embeddings: false
local_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: prompts/local_search_system_prompt.txt
global_search:
  chat_model_id: default_chat_model
  map_prompt: prompts/global_search_map_system_prompt.txt
  reduce_prompt: prompts/global_search_reduce_system_prompt.txt
  knowledge_prompt: prompts/global_search_knowledge_system_prompt.txt
drift_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: prompts/drift_search_system_prompt.txt
  reduce_prompt: prompts/drift_search_reduce_prompt.txt
basic_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: prompts/basic_search_system_prompt.txt


Logs and screenshots

Image

Additional Information

  • GraphRAG Version: 2.1.0
  • Operating System: Linux
  • Python Version: 3.11.2
  • Related Issues:

ashkan-software2 avatar May 01 '25 08:05 ashkan-software2

I confirm this is a bug, as I tested by removing the extra } characters in extract_graph.txt, the indexing proceeds with no bug.

Image

ashkan-software2 avatar May 01 '25 09:05 ashkan-software2

Please try again with version 2.2.1, which includes updates to the prompt template to resolve the format call removing too many braces

natoverse avatar May 05 '25 23:05 natoverse

This issue has been marked stale due to inactivity after repo maintainer or community member responses that request more information or suggest a solution. It will be closed after five additional days.

github-actions[bot] avatar May 13 '25 02:05 github-actions[bot]

I am facing another issue upon updating:

Traceback (most recent call last):
  File "/home/myuser/.cache/pypoetry/virtualenvs/service-vector-embedding-6NKDQ0ig-py3.11/lib/python3.11/site-packages/graphrag/index/operations/summarize_communities/community_reports_extractor.py", line 76, in __call__
    prompt = self._extraction_prompt.format(**{
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: '\n        "title"'
17:53:13,739 graphrag.callbacks.file_workflow_callbacks INFO Community Report Extraction Error details=None
17:53:13,739 graphrag.index.operations.summarize_communities.strategies WARNING No report found for community: 260.0
17:53:13,739 graphrag.index.operations.summarize_communities.community_reports_extractor ERROR error generating community report
Traceback (most recent call last):
  File "/home/myuser/.cache/pypoetry/virtualenvs/service-vector-embedding-6NKDQ0ig-py3.11/lib/python3.11/site-packages/graphrag/index/operations/summarize_communities/community_reports_extractor.py", line 76, in __call__
    prompt = self._extraction_prompt.format(**{
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: '\n        "title"'
17:53:13,740 graphrag.callbacks.file_workflow_callbacks INFO Community Report Extraction Error details=None
17:53:13,740 graphrag.index.operations.summarize_communities.strategies WARNING No report found for community: 261.0
17:53:13,740 graphrag.index.operations.summarize_communities.community_reports_extractor ERROR error generating community report

the contents of the related files (i show only part of it):

community_report_graph.txt

parsed by json.loads.
    {
        "title": <report_title>,
        "summary": <executive_summary>,
        "rating": <impact_severity_rating>,
        "rating_explanation": <rating_explanation>,
        "findings": [

community_report_text.txt

parsed by json.loads.
    {{
        "title": "<report_title>",
        "summary": "<executive_summary>",
        "rating": <importance_rating>,
        "rating_explanation": "<rating_explanation>",
        "findings": [{{"summary":"<insight_1_summary>", "explanation": 

ashkan-software2 avatar May 15 '25 07:05 ashkan-software2

This issue has been marked stale due to inactivity after repo maintainer or community member responses that request more information or suggest a solution. It will be closed after five additional days.

github-actions[bot] avatar May 24 '25 02:05 github-actions[bot]

This issue has been closed after being marked as stale for five days. Please reopen if needed.

github-actions[bot] avatar May 29 '25 02:05 github-actions[bot]