graphrag
graphrag copied to clipboard
[Issue]: Report only includes partial content for indexing multiple docs
Is there an existing issue for this?
- [X] I have searched the existing issues
- [X] I have checked #657 to validate if my issue is covered by community support
Describe the issue
When I placed multiple txt files in the "input" folder and completed the indexing, the community reports only include the content from one of the files,. This especially happens when there is less related between the documents. However, the extraction of entities and relationships is correct. I tried prompt tuning before doing the indexing, but it doesn't seem to have any effect.
Steps to reproduce
No response
GraphRAG Config Used
encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ${AZURE_API_KEY}
type: azure_openai_chat
api_base: ${AZURE_ENDPOINT}
api_version: ${API_VERSION}
deployment_name: gpt-4o
model_supports_json: true
parallelization:
stagger: 0.3
async_mode: threaded
embeddings:
async_mode: threaded
llm:
api_key: ${AZURE_API_KEY}
type: azure_openai_embedding
api_base: ${AZURE_ENDPOINT}
api_version: ${API_VERSION}
deployment_name: text-embedding-ada-002
chunks:
size: 300
overlap: 100
group_by_columns: [id]
input:
type: file
file_type: text
base_dir: "input"
file_encoding: utf-8
file_pattern: ".*\\.txt$"
cache:
type: file
base_dir: "cache"
storage:
type: file
base_dir: "output/${timestamp}/artifacts"
reporting:
type: file
base_dir: "output/${timestamp}/reports"
entity_extraction:
prompt: "prompts/entity_extraction.txt"
entity_types: [organization, person, geo, event]
max_gleanings: 0
summarize_descriptions:
prompt: "prompts/summarize_descriptions.txt"
max_length: 500
claim_extraction:
prompt: "prompts/claim_extraction.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 0
community_report:
prompt: "prompts/community_report.txt"
max_length: 2000
max_input_length: 8000
cluster_graph:
max_cluster_size: 10
embed_graph:
enabled: false
umap:
enabled: false
snapshots:
graphml: false
raw_entities: false
top_level_nodes: false
local_search:
global_search:
Logs and screenshots
No response
Additional Information
- GraphRAG Version: 0.2.2
- Operating System: masOS, windows
- Python Version: 3.12.2
- Related Issues: