graphrag
graphrag copied to clipboard
[Issue]: Why community report are English while text is another language
Describe the issue
My text is Vietnamese but I receive the community report/content are English.
I have tried to custom the prompt or translate the prompt to Vietnamese. Though the descriptions of entity and relationship are Vietnamese but the community reports somehow are English.
One more thing is there are still hallucinations issue, when I read the generated description there are some external knowledge.
Steps to reproduce
No response
GraphRAG Config Used
No response
Logs and screenshots
No response
Additional Information
- GraphRAG Version:
- Operating System:
- Python Version: 3.10
- Related Issues:
What LLM model do you use? and what prompt you use to create the community report?
What LLM model do you use? and what prompt you use to create the community report?
I use gpt-4-turbo-preview.
About the prompt first I tried prompt tunning for my domain by:
python -m graphrag.prompt_tune --root /path/to/project --no-entity-types . There is a detect language step in the pipeline, but it didn't work. They're still English.
Then I tried:
python -m graphrag.prompt_tune --root /path/to/project --method random --limit 10 --language Vietnamese --max-tokens 2048 --chunk-size 256 --no-entity-types --output /path/to/output
But it gave an error that there is no language argument
Then I just use the default prompt in here: https://github.com/microsoft/graphrag/blob/main/graphrag/index/graph/extractors/community_reports/prompts.py but translate into Vietnamese
The —language flag was added after the initial release so if you are using the Graphrag package from PyPI, that flag is not available and would explain why you are seeing the error.
I recommend either building the Python package from source (by running poetry build in the root directory of the repo) and installing the Python wheel or you can wait - we will be releasing a new version to PyPI very soon. Please reopen this issue if you still experience problems after testing out the new release.