[Issue]: <title> When I execute in fast mode, the following error occurs. What's the reason

Open jhyever opened this issue 7 months ago • 0 comments

Do you need to file an issue?

[ ] I have searched the existing issues and this bug is not already filed.
[ ] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
[ ] I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

@natoverse Could you help me look at this problem? Thank you very much

Describe the issue

command: graphrag index --method fast --root ./graphrag_test

ERROR： 17:42:58,872 graphrag.index.input.util INFO Total number of unfiltered InputFileType.text rows: 1 17:42:58,874 graphrag.index.run.run_pipeline INFO Final # of rows loaded: 1 17:42:58,885 graphrag.utils.storage INFO reading table from storage: documents.parquet 17:42:58,918 graphrag.utils.storage INFO reading table from storage: documents.parquet 17:42:58,922 graphrag.utils.storage INFO reading table from storage: text_units.parquet 17:42:58,959 graphrag.utils.storage INFO reading table from storage: text_units.parquet 17:42:58,963 graphrag.index.run.run_pipeline ERROR error running workflow extract_graph_nlp Traceback (most recent call last): File "/data1/aimind/graphrag/graphrag/graphrag/index/run/run_pipeline.py", line 129, in _run_pipeline result = await workflow_function(config, context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data1/aimind/graphrag/graphrag/graphrag/index/workflows/extract_graph_nlp.py", line 27, in run_workflow entities, relationships = await extract_graph_nlp( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/data1/aimind/graphrag/graphrag/graphrag/index/workflows/extract_graph_nlp.py", line 51, in extract_graph_nlp text_analyzer = create_noun_phrase_extractor(text_analyzer_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data1/aimind/graphrag/graphrag/graphrag/index/operations/build_noun_graph/np_extractors/factory.py", line 82, in create_noun_phrase_extractor return NounPhraseExtractorFactory.get_np_extractor(analyzer_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data1/aimind/graphrag/graphrag/graphrag/index/operations/build_noun_graph/np_extractors/factory.py", line 71, in get_np_extractor return RegexENNounPhraseExtractor( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data1/aimind/graphrag/graphrag/graphrag/index/operations/build_noun_graph/np_extractors/regex_extractor.py", line 52, in init download_if_not_exists("punkt") File "/data1/aimind/graphrag/graphrag/graphrag/index/operations/build_noun_graph/np_extractors/resource_loader.py", line 31, in download_if_not_exists nltk.find(f"{category}/{resource_name}") File "/data1/aimind/anaconda3/envs/graphrag200/lib/python3.12/site-packages/nltk/data.py", line 551, in find return find(modified_name, paths) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data1/aimind/anaconda3/envs/graphrag200/lib/python3.12/site-packages/nltk/data.py", line 538, in find return ZipFilePathPointer(p, zipentry) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data1/aimind/anaconda3/envs/graphrag200/lib/python3.12/site-packages/nltk/data.py", line 391, in init zipfile = OpenOnDemandZipFile(os.path.abspath(zipfile)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data1/aimind/anaconda3/envs/graphrag200/lib/python3.12/site-packages/nltk/data.py", line 1020, in init zipfile.ZipFile.init(self, filename) File "/data1/aimind/anaconda3/envs/graphrag200/lib/python3.12/zipfile/init.py", line 1349, in init self._RealGetContents() File "/data1/aimind/anaconda3/envs/graphrag200/lib/python3.12/zipfile/init.py", line 1416, in _RealGetContents raise BadZipFile("File is not a zip file") zipfile.BadZipFile: File is not a zip file

process： I used a zip package, but it also reported an error when it contained a txt file

Steps to reproduce

No response

GraphRAG Config Used

# Paste your config here

Logs and screenshots

No response

Additional Information

GraphRAG Version:
Operating System:
Python Version:
Related Issues:

May 07 '25 09:05 jhyever