[Bug]: json.decoder.JSONDecodeError when generating Community reports
Describe the bug
json={{ "title": "Product Team: Mansz and Jrman" {{ is giving error. I tried to fix the system message for community report. but I found out the error still persists and when I looked into report then it shows that community_report prompt is null.
in setting.yaml, prompt filename for community report is "prompts/community_report.txt" I updated double braces '{{' to single { in "community_report.txt" but it still creating the json with double '{{'.
community_report:
prompt: "prompts/community_report.txt"
max_length: 4000
max_input_length: 12000
Also, in indexing-engine.log in "community_reports" section, its showing "prompt": null and not showing the filename as 'prompts/community_report.txt', which is mentioned in setting.yaml
"community_reports":
"async_mode": "threaded",
"prompt": null,
"max_length": 2000,
"max_input_length": 8000,
"strategy": null
Steps to reproduce
No response
Expected Behavior
No response
GraphRAG Config Used
encoding_model: cl100k_base skip_workflows: [] llm: api_key: ${GRAPHRAG_API_KEY} type: azure_openai_chat # or azure_openai_chat model: gpt-4-32k (0613) model_supports_json: false
max_tokens: 4000
request_timeout: 180.0
api_base: -removed because of security purpose api_version: '2023-05-15'
organization: <organization_id>
deployment_name: gpt-4-32k
tokens_per_minute: 150_000 # set a leaky bucket throttle
requests_per_minute: 10_000 # set a leaky bucket throttle
max_retries: 10
max_retry_wait: 10.0
sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
concurrent_requests: 25 # the number of parallel inflight requests that may be made
parallelization: stagger: 0.3
num_threads: 50 # the number of threads to use for parallel processing
async_mode: threaded # or asyncio
embeddings:
parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio llm: api_key: ${GRAPHRAG_API_KEY} type: azure_openai_embedding model: text-embedding-ada-002 api_base: removed because of security purpose api_version: '2023-05-15' # organization: <organization_id> deployment_name: embedding # tokens_per_minute: 150_000 # set a leaky bucket throttle # requests_per_minute: 10_000 # set a leaky bucket throttle # max_retries: 10 # max_retry_wait: 10.0 # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times # concurrent_requests: 25 # the number of parallel inflight requests that may be made # batch_size: 16 # the number of documents to send in a single request # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request # target: required # or optional
chunks: size: 300 overlap: 100 group_by_columns: [id] # by default, we don't allow chunks to cross documents
input: type: file # or blob file_type: text # or csv base_dir: "input" file_encoding: utf-8 file_pattern: ".*\.txt$"
cache: type: file # or blob base_dir: "cache"
connection_string: <azure_blob_storage_connection_string>
container_name: <azure_blob_storage_container_name>
storage: type: file # or blob base_dir: "output/${timestamp}/artifacts"
connection_string: <azure_blob_storage_connection_string>
container_name: <azure_blob_storage_container_name>
reporting: type: file # or console, blob base_dir: "output/${timestamp}/reports"
connection_string: <azure_blob_storage_connection_string>
container_name: <azure_blob_storage_container_name>
entity_extraction:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
prompt: "prompts/entity_extraction.txt" entity_types: [organization,person,geo,event] max_gleanings: 0
summarize_descriptions:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
prompt: "prompts/summarize_descriptions.txt" max_length: 500
claim_extraction:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
enabled: true prompt: "prompts/claim_extraction.txt" description: "Any claims or facts that could be relevant to information discovery." max_gleanings: 0
community_report:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
prompt: "prompts/community_report.txt" max_length: 4000 max_input_length: 12000
cluster_graph: max_cluster_size: 10
embed_graph: enabled: false # if true, will generate node2vec embeddings for nodes
num_walks: 10
walk_length: 40
window_size: 2
iterations: 3
random_seed: 597832
umap: enabled: false # if true, will generate UMAP embeddings for nodes
snapshots: graphml: false raw_entities: false top_level_nodes: false
local_search:
text_unit_prop: 0.5
community_prop: 0.1
conversation_history_max_turns: 5
top_k_mapped_entities: 10
top_k_relationships: 10
max_tokens: 12000
global_search:
max_tokens: 12000
data_max_tokens: 12000
map_max_tokens: 1000
reduce_max_tokens: 2000
concurrency: 32
Logs and screenshots
20:19:16,982 graphrag.config.read_dotenv INFO Loading pipeline .env file 20:19:16,988 graphrag.index.cli INFO using default configuration: { "llm": { "api_key": "REDACTED, length 32", "type": "azure_openai_chat", "model": "gpt-4-32k (0613)", "max_tokens": 4000, "request_timeout": 180.0, "api_base": "removed because of security purpose", "api_version": "2023-05-15", "proxy": null, "cognitive_services_endpoint": null, "deployment_name": "gpt-4-32k", "model_supports_json": false, "tokens_per_minute": 0, "requests_per_minute": 0, "max_retries": 10, "max_retry_wait": 10.0, "sleep_on_rate_limit_recommendation": true, "concurrent_requests": 25 }, "parallelization": { "stagger": 0.3, "num_threads": 50 }, "async_mode": "threaded", "root_dir": "GraphRAG/", "reporting": { "type": "file", "base_dir": "output/${timestamp}/reports", "storage_account_blob_url": null }, "storage": { "type": "file", "base_dir": "output/${timestamp}/artifacts", "storage_account_blob_url": null }, "cache": { "type": "file", "base_dir": "cache", "storage_account_blob_url": null }, "input": { "type": "file", "file_type": "text", "base_dir": "input", "storage_account_blob_url": null, "encoding": "utf-8", "file_pattern": ".*\.txt$", "file_filter": null, "source_column": null, "timestamp_column": null, "timestamp_format": null, "text_column": "text", "title_column": null, "document_attribute_columns": [] }, "embed_graph": { "enabled": false, "num_walks": 10, "walk_length": 40, "window_size": 2, "iterations": 3, "random_seed": 597832, "strategy": null }, "embeddings": { "llm": { "api_key": "REDACTED, length 32", "type": "azure_openai_embedding", "model": "text-embedding-ada-002", "max_tokens": 4000, "request_timeout": 180.0, "api_base": "removed because of security purpose", "api_version": "2023-05-15", "proxy": null, "cognitive_services_endpoint": null, "deployment_name": "embedding", "model_supports_json": null, "tokens_per_minute": 0, "requests_per_minute": 0, "max_retries": 10, "max_retry_wait": 10.0, "sleep_on_rate_limit_recommendation": true, "concurrent_requests": 25 }, "parallelization": { "stagger": 0.3, "num_threads": 50 }, "async_mode": "threaded", "batch_size": 16, "batch_max_tokens": 8191, "target": "required", "skip": [], "vector_store": null, "strategy": null }, "chunks": { "size": 300, "overlap": 100, "group_by_columns": [ "id" ], "strategy": null }, "snapshots": { "graphml": false, "raw_entities": false, "top_level_nodes": false }, "entity_extraction": { "llm": { "api_key": "REDACTED, length 32", "type": "azure_openai_chat", "model": "gpt-4-32k (0613)", "max_tokens": 4000, "request_timeout": 180.0, "api_base": "removed because of security purpose", "api_version": "2023-05-15", "proxy": null, "cognitive_services_endpoint": null, "deployment_name": "gpt-4-32k", "model_supports_json": false, "tokens_per_minute": 0, "requests_per_minute": 0, "max_retries": 10, "max_retry_wait": 10.0, "sleep_on_rate_limit_recommendation": true, "concurrent_requests": 25 }, "parallelization": { "stagger": 0.3, "num_threads": 50 }, "async_mode": "threaded", "prompt": "prompts/entity_extraction.txt", "entity_types": [ "organization", "person", "geo", "event" ], "max_gleanings": 0, "strategy": null }, "summarize_descriptions": { "llm": { "api_key": "REDACTED, length 32", "type": "azure_openai_chat", "model": "gpt-4-32k (0613)", "max_tokens": 4000, "request_timeout": 180.0, "api_base": "removed because of security purpose", "api_version": "2023-05-15", "proxy": null, "cognitive_services_endpoint": null, "deployment_name": "gpt-4-32k", "model_supports_json": false, "tokens_per_minute": 0, "requests_per_minute": 0, "max_retries": 10, "max_retry_wait": 10.0, "sleep_on_rate_limit_recommendation": true, "concurrent_requests": 25 }, "parallelization": { "stagger": 0.3, "num_threads": 50 }, "async_mode": "threaded", "prompt": "prompts/summarize_descriptions.txt", "max_length": 500, "strategy": null }, "community_reports": { "llm": { "api_key": "REDACTED, length 32", "type": "azure_openai_chat", "model": "gpt-4-32k (0613)", "max_tokens": 4000, "request_timeout": 180.0, "api_base": "removed because of security purpose", "api_version": "2023-05-15", "proxy": null, "cognitive_services_endpoint": null, "deployment_name": "gpt-4-32k", "model_supports_json": false, "tokens_per_minute": 0, "requests_per_minute": 0, "max_retries": 10, "max_retry_wait": 10.0, "sleep_on_rate_limit_recommendation": true, "concurrent_requests": 25 }, "parallelization": { "stagger": 0.3, "num_threads": 50 }, "async_mode": "threaded", "prompt": null, "max_length": 2000, "max_input_length": 8000, "strategy": null }, "claim_extraction": { "llm": { "api_key": "REDACTED, length 32", "type": "azure_openai_chat", "model": "gpt-4-32k (0613)", "max_tokens": 4000, "request_timeout": 180.0, "api_base": "removed because of security purpose", "api_version": "2023-05-15", "proxy": null, "cognitive_services_endpoint": null, "deployment_name": "gpt-4-32k", "model_supports_json": false, "tokens_per_minute": 0, "requests_per_minute": 0, "max_retries": 10, "max_retry_wait": 10.0, "sleep_on_rate_limit_recommendation": true, "concurrent_requests": 25 }, "parallelization": { "stagger": 0.3, "num_threads": 50 }, "async_mode": "threaded", "enabled": true, "prompt": "prompts/claim_extraction.txt", "description": "Any claims or facts that could be relevant to information discovery.", "max_gleanings": 0, "strategy": null }, "cluster_graph": { "max_cluster_size": 10, "strategy": null }, "umap": { "enabled": false }, "local_search": { "text_unit_prop": 0.5, "community_prop": 0.1, "conversation_history_max_turns": 5, "top_k_entities": 10, "top_k_relationships": 10, "max_tokens": 12000, "llm_max_tokens": 2000 }, "global_search": { "max_tokens": 12000, "data_max_tokens": 12000, "map_max_tokens": 1000, "reduce_max_tokens": 2000, "concurrency": 32 }, "encoding_model": "cl100k_base", "skip_workflows": [] }
20:20:39,273 graphrag.index.reporting.file_workflow_callbacks INFO Community Report Extraction Error details=None 20:20:39,273 graphrag.index.verbs.graph.report.strategies.graph_intelligence.run_graph_intelligence WARNING No report found for community: 0 20:20:39,346 httpx INFO HTTP Request: POST --" 20:20:39,347 graphrag.llm.openai.utils ERROR error loading json, json={{ "title": "Application Support Team and Controlled Environment", "summary": "The community revolves around the Application Support Team, which provides assistance to users experiencing problems with the application. The team interacts with various features of the application, including the Controlled Environment, Admin Tab, In-app Support Ticket System, Statuses File, and Summary View.", "rating": 7.0, "rating_explanation": "The impact severity rating is high due to the critical role of the Application Support Team in ensuring smooth operation of the application.", "findings": [ {{ "summary": "Functionality of the Summary View", "explanation": "The Summary View is a customizable section of the application where users can adjust the display of information. The Application Support Team can provide assistance for customizing the Summary View, indicating its complexity and potential for user customization. [Data: Entities (26), Relationships (37)]" }} ]}} Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/graphrag/llm/openai/utils.py", line 93, in try_parse_json_object result = json.loads(input) File "/opt/conda/lib/python3.10/json/init.py", line 346, in loads return _default_decoder.decode(s) File "/opt/conda/lib/python3.10/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/opt/conda/lib/python3.10/json/decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1) 20:20:39,349 graphrag.llm.openai.openai_chat_llm WARNING error parsing llm json, retrying 20:20:39,978 httpx INFO HTTP Request: POST https://agvisorapimtest.azure-api.net/openapi-test/openai/deployments/gpt-4-32k/chat/completions?api-version=2023-05-15 "HTTP/1.1 200 OK" 20:20:39,980 graphrag.llm.openai.utils ERROR error loading json, json={output_text} Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/graphrag/llm/openai/openai_chat_llm.py", line 124, in _manual_json json_output = try_parse_json_object(output) File "/opt/conda/lib/python3.10/site-packages/graphrag/llm/openai/utils.py", line 93, in try_parse_json_object result = json.loads(input) File "/opt/conda/lib/python3.10/json/init.py", line 346, in loads return _default_decoder.decode(s) File "/opt/conda/lib/python3.10/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/opt/conda/lib/python3.10/json/decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
During handling of the above exception, another exception occurred:
Additional Information
- GraphRAG Version: 0.1.1
- Operating System: AWS sagemaker distribution 1.9
- Python Version: 3.10.14
- Related Issues: