graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

[Issue]: Why community report are English while text is another language

Open dinhngoc267 opened this issue 1 year ago • 2 comments

Describe the issue

My text is Vietnamese but I receive the community report/content are English.

I have tried to custom the prompt or translate the prompt to Vietnamese. Though the descriptions of entity and relationship are Vietnamese but the community reports somehow are English.

One more thing is there are still hallucinations issue, when I read the generated description there are some external knowledge.

Steps to reproduce

No response

GraphRAG Config Used

No response

Logs and screenshots

No response

Additional Information

  • GraphRAG Version:
  • Operating System:
  • Python Version: 3.10
  • Related Issues:

dinhngoc267 avatar Jul 18 '24 09:07 dinhngoc267

What LLM model do you use? and what prompt you use to create the community report?

ngoanpv avatar Jul 18 '24 10:07 ngoanpv

What LLM model do you use? and what prompt you use to create the community report?

I use gpt-4-turbo-preview.

About the prompt first I tried prompt tunning for my domain by:

python -m graphrag.prompt_tune --root /path/to/project --no-entity-types . There is a detect language step in the pipeline, but it didn't work. They're still English.

Then I tried:

python -m graphrag.prompt_tune --root /path/to/project --method random --limit 10 --language Vietnamese --max-tokens 2048 --chunk-size 256 --no-entity-types --output /path/to/output

But it gave an error that there is no language argument

Then I just use the default prompt in here: https://github.com/microsoft/graphrag/blob/main/graphrag/index/graph/extractors/community_reports/prompts.py but translate into Vietnamese

dinhngoc267 avatar Jul 18 '24 13:07 dinhngoc267

The —language flag was added after the initial release so if you are using the Graphrag package from PyPI, that flag is not available and would explain why you are seeing the error.

I recommend either building the Python package from source (by running poetry build in the root directory of the repo) and installing the Python wheel or you can wait - we will be releasing a new version to PyPI very soon. Please reopen this issue if you still experience problems after testing out the new release.

jgbradley1 avatar Jul 22 '24 17:07 jgbradley1