dify PackageNotFoundError: Package not found at '/tmp/tmp8umf7jfp/rgke

Self Checks

[x] This is only for bug report, if you would like to ask a question, please head to Discussions.
[x] I have searched for existing issues search for existing issues, including closed ones.
[x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[x] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[x] Please do not modify this template :) and fill in all the required fields.

Dify version

0.15.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

When I upgraded to version 0.15.3, the datasets api /datasets/{dataset_id}/document/create-by-file reported an error: PackageNotFoundError: Package not found at '/tmp/tmp8umf7jfp/rgke_f3n.docx'

I found this bug in issues https://github.com/langgenius/dify/issues/3888. The old version appeared, but now 0.15.3 has a similar problem

The error stack is as follows:

2025-03-01 11:21:45.715 INFO [Dummy-40] [duplicate_document_indexing_task.py:60] - Start process document: d734005d-4693-4f52-b6e2-6f625d56cce0 2025-03-01 11:21:45.763 ERROR [Dummy-40] [indexing_runner.py:96] - consume document failed Traceback (most recent call last): File "/app/api/core/indexing_runner.py", line 70, in run text_docs = self._extract(index_processor, dataset_document, processing_rule.to_dict()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/api/core/indexing_runner.py", line 351, in _extract text_docs = index_processor.extract(extract_setting, process_rule_mode=process_rule["mode"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 22, in extract text_docs = ExtractProcessor.extract( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/api/core/rag/extractor/extract_processor.py", line 161, in extract return extractor.extract() ^^^^^^^^^^^^^^^^^^^ File "/app/api/core/rag/extractor/word_extractor.py", line 65, in extract content = self.parse_docx(self.file_path, "storage") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/api/core/rag/extractor/word_extractor.py", line 210, in parse_docx doc = DocxDocument(docx_path) ^^^^^^^^^^^^^^^^^^^^^^^ File "/app/api/.venv/lib/python3.12/site-packages/docx/api.py", line 27, in Document document_part = cast("DocumentPart", Package.open(docx).main_document_part) ^^^^^^^^^^^^^^^^^^ File "/app/api/.venv/lib/python3.12/site-packages/docx/opc/package.py", line 127, in open pkg_reader = PackageReader.from_file(pkg_file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/api/.venv/lib/python3.12/site-packages/docx/opc/pkgreader.py", line 22, in from_file phys_reader = PhysPkgReader(pkg_file) ^^^^^^^^^^^^^^^^^^^^^^^ File "/app/api/.venv/lib/python3.12/site-packages/docx/opc/phys_pkg.py", line 21, in new raise PackageNotFoundError("Package not found at '%s'" % pkg_file) docx.opc.exceptions.PackageNotFoundError: Package not found at '/tmp/tmp8umf7jfp/rgke_f3n.docx' 2025-03-01 11:21:45.771 INFO [Dummy-40] [duplicate_document_indexing_task.py:92] - Processed dataset: 1304edb6-ab60-4196-b9f4-55e2155bbf16 latency: 0.07849350199103355

✔️ Expected Behavior

The uploaded document is properly embedded

❌ Actual Behavior

Mar 01 '25 11:03 rchlz

@rchlz, hey there! I'm here to help you with any bug reports, questions, or contributions. Let me know what you need assistance with.

The error you're encountering, PackageNotFoundError: Package not found at '/tmp/tmp8umf7jfp/rgke_f3n.docx', suggests that the temporary file used for processing the .docx document is not being found at the expected path. This issue might be related to how temporary files are handled in the word_extractor.py module.

In Dify, temporary files for .docx documents are created using Python's tempfile.NamedTemporaryFile, and the cleanup of these files is managed by the __del__ method, which closes the temporary file when the WordExtractor object is deleted ^[1]. If the file is not found, it could be due to the temporary file being prematurely deleted or not being created correctly.

Here are a few steps you can take to troubleshoot and potentially resolve the issue:

Check File Creation and Deletion: Ensure that the temporary file is being created and not deleted before it is accessed. You might want to add logging around the creation and deletion of the temporary file to verify this.
Verify File Path Handling: Make sure that the file path is correctly passed and used throughout the document processing workflow. Any discrepancies in file path handling could lead to the file not being found.
Review Recent Changes: Since this issue appeared after upgrading to version 0.15.3, review any changes related to file handling or the word_extractor.py module in this version. There might be a bug introduced in the recent update.
Check for Known Issues: Look for any known issues or patches related to this problem in the Dify repository or its documentation.

By following these steps, you should be able to identify the root cause of the issue and apply a suitable fix.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

Mar 01 '25 11:03 dosubot[bot]

I checked the temp directory and found that in the/tmp/tmp8umf7jfp/ directory, there are files I uploaded through API /datasets/{dataset_id}/document/create by file, but the file size is 0. At the same time, I find that the documents uploaded directly through the dify Web UI are also in this directory, but the file size is correct.

Mar 02 '25 13:03 rchlz

May I ask, have you solved this problem? How did you do it, I had this problem too!

Mar 07 '25 17:03 miaoxiannv

May I ask, have you solved this problem? How did you do it, I had this problem too! Maybe the file format has been changed, for example, from doc to docx

Mar 19 '25 05:03 liuguoxionglang

May I ask, have you solved this problem? How did you do it, I had this problem too! Maybe the file format has been changed, for example, from doc to docx

I think it's a joke just because word was opened up

Mar 19 '25 09:03 miaoxiannv

May I ask, have you solved this problem? How did you do it, I had this problem too!

i have solved this problem，I am using the Python request library incorrectly to upload files

Mar 29 '25 05:03 rchlz