dify
dify copied to clipboard
Dataset: Failed to upload xlsx file
Self Checks
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to file this report (我已阅读并同意 Language Policy).
Dify version
0.5.0
Cloud or Self Hosted
Self Hosted (Source)
Steps to reproduce
When uploading a xlsx to a dataset, the request failed with 500 internal error.
logs from API with docker compose:
api_1 | [2024-01-24 08:31:07.160][ERROR][app.py][1414] Exception on /console/api/datasets/indexing-estimate [POST]
api_1 | Traceback (most recent call last):
api_1 | File "/usr/local/lib/python3.10/site-packages/openpyxl/descriptors/base.py", line 59, in _convert
api_1 | value = expected_type(value)
api_1 | TypeError: Fill() takes no arguments
api_1 |
api_1 | During handling of the above exception, another exception occurred:
api_1 |
api_1 | Traceback (most recent call last):
api_1 | File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1484, in full_dispatch_request
api_1 | rv = self.dispatch_request()
api_1 | File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1469, in dispatch_request
api_1 | return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
api_1 | File "/usr/local/lib/python3.10/site-packages/flask_restful/__init__.py", line 467, in wrapper
api_1 | resp = resource(*args, **kwargs)
api_1 | File "/usr/local/lib/python3.10/site-packages/flask/views.py", line 109, in view
api_1 | return current_app.ensure_sync(self.dispatch_request)(**kwargs)
api_1 | File "/usr/local/lib/python3.10/site-packages/flask_restful/__init__.py", line 582, in dispatch_request
api_1 | resp = meth(*args, **kwargs)
api_1 | File "/app/api/controllers/console/setup.py", line 77, in decorated
api_1 | return view(*args, **kwargs)
api_1 | File "/app/api/libs/login.py", line 90, in decorated_view
api_1 | return current_app.ensure_sync(func)(*args, **kwargs)
api_1 | File "/app/api/controllers/console/wraps.py", line 19, in decorated
api_1 | return view(*args, **kwargs)
api_1 | File "/app/api/controllers/console/datasets/datasets.py", line 284, in post
api_1 | response = indexing_runner.file_indexing_estimate(current_user.current_tenant_id, file_details,
api_1 | File "/app/api/core/indexing_runner.py", line 287, in file_indexing_estimate
api_1 | text_docs = FileExtractor.load(file_detail, is_automatic=processing_rule.mode == 'automatic')
api_1 | File "/app/api/core/data_loader/file_extractor.py", line 36, in load
api_1 | return cls.load_from_file(file_path, return_text, upload_file, is_automatic)
api_1 | File "/app/api/core/data_loader/file_extractor.py", line 106, in load_from_file
api_1 | return delimiter.join([document.page_content for document in loader.load()]) if return_text else loader.load()
api_1 | File "/app/api/core/data_loader/loader/excel.py", line 30, in load
api_1 | wb = load_workbook(filename=self._file_path, read_only=True)
api_1 | File "/usr/local/lib/python3.10/site-packages/openpyxl/reader/excel.py", line 346, in load_workbook
api_1 | reader.read()
api_1 | File "/usr/local/lib/python3.10/site-packages/openpyxl/reader/excel.py", line 299, in read
api_1 | apply_stylesheet(self.archive, self.wb)
api_1 | File "/usr/local/lib/python3.10/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet
api_1 | stylesheet = Stylesheet.from_tree(node)
api_1 | File "/usr/local/lib/python3.10/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree
api_1 | return super(Stylesheet, cls).from_tree(node)
api_1 | File "/usr/local/lib/python3.10/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
api_1 | return cls(**attrib)
api_1 | File "/usr/local/lib/python3.10/site-packages/openpyxl/styles/stylesheet.py", line 74, in __init__
api_1 | self.fills = fills
api_1 | File "/usr/local/lib/python3.10/site-packages/openpyxl/descriptors/sequence.py", line 27, in __set__
api_1 | seq = self.container(_convert(self.expected_type, value) for value in seq)
api_1 | File "/usr/local/lib/python3.10/site-packages/openpyxl/descriptors/sequence.py", line 27, in <genexpr>
api_1 | seq = self.container(_convert(self.expected_type, value) for value in seq)
api_1 | File "/usr/local/lib/python3.10/site-packages/openpyxl/descriptors/base.py", line 61, in _convert
api_1 | raise TypeError('expected ' + str(expected_type))
api_1 | TypeError: expected <class 'openpyxl.styles.fills.Fill'>
and
api_1 | [2024-01-24 09:09:03.945][ERROR][app.py][1414] Exception on /console/api/datasets/indexing-estimate [POST]
api_1 | Traceback (most recent call last):
api_1 | File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1484, in full_dispatch_request
api_1 | rv = self.dispatch_request()
api_1 | File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1469, in dispatch_request
api_1 | return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
api_1 | File "/usr/local/lib/python3.10/site-packages/flask_restful/__init__.py", line 467, in wrapper
api_1 | resp = resource(*args, **kwargs)
api_1 | File "/usr/local/lib/python3.10/site-packages/flask/views.py", line 109, in view
api_1 | return current_app.ensure_sync(self.dispatch_request)(**kwargs)
api_1 | File "/usr/local/lib/python3.10/site-packages/flask_restful/__init__.py", line 582, in dispatch_request
api_1 | resp = meth(*args, **kwargs)
api_1 | File "/app/api/controllers/console/setup.py", line 77, in decorated
api_1 | return view(*args, **kwargs)
api_1 | File "/app/api/libs/login.py", line 90, in decorated_view
api_1 | return current_app.ensure_sync(func)(*args, **kwargs)
api_1 | File "/app/api/controllers/console/wraps.py", line 19, in decorated
api_1 | return view(*args, **kwargs)
api_1 | File "/app/api/controllers/console/datasets/datasets.py", line 284, in post
api_1 | response = indexing_runner.file_indexing_estimate(current_user.current_tenant_id, file_details,
api_1 | File "/app/api/core/indexing_runner.py", line 287, in file_indexing_estimate
api_1 | text_docs = FileExtractor.load(file_detail, is_automatic=processing_rule.mode == 'automatic')
api_1 | File "/app/api/core/data_loader/file_extractor.py", line 36, in load
api_1 | return cls.load_from_file(file_path, return_text, upload_file, is_automatic)
api_1 | File "/app/api/core/data_loader/file_extractor.py", line 106, in load_from_file
api_1 | return delimiter.join([document.page_content for document in loader.load()]) if return_text else loader.load()
api_1 | File "/app/api/core/data_loader/loader/csv_loader.py", line 31, in load
api_1 | docs = self._read_from_file(csvfile)
api_1 | File "/app/api/core/data_loader/loader/csv_loader.py", line 52, in _read_from_file
api_1 | content = "\n".join(f"{k.strip()}: {v.strip()}" for k, v in row.items())
api_1 | File "/app/api/core/data_loader/loader/csv_loader.py", line 52, in <genexpr>
api_1 | content = "\n".join(f"{k.strip()}: {v.strip()}" for k, v in row.items())
api_1 | AttributeError: 'NoneType' object has no attribute 'strip'
✔️ Expected Behavior
No response
❌ Actual Behavior
No response
🤖
Based on the error logs you've provided, it seems like the issue is related to the openpyxl
library, which is used for reading and writing Excel files. The error message TypeError: Fill() takes no arguments
suggests that there might be an issue with the way the library is handling the formatting or styling of your Excel file.
This could be due to a variety of reasons, such as a specific feature or formatting in your Excel file that is not supported by the version of openpyxl
that Dify is using. It could also be a bug in the openpyxl
library itself.
As a workaround, you could try to remove any complex formatting or styles from your Excel file and try uploading it again. If the issue persists, it might be helpful to provide a sample Excel file that can reproduce the issue. This would help in identifying the exact cause of the problem.
For further investigation, you could also check the version of openpyxl
that Dify is using and see if there are any known issues or updates related to this error. You can do this by running pip show openpyxl
in your Dify environment.
Please note that this is just a potential cause of the issue based on the provided error logs. There might be other factors at play here, and without more information, it's hard to pinpoint the exact cause. If you could provide more details about the Excel file you're trying to upload (e.g., its size, the number of rows/columns, the type of data it contains, etc.), it would be easier to diagnose the issue.
For more information about the openpyxl
library and its usage, you can refer to its official documentation.
I hope this helps! If you have any more questions or need further clarification, feel free to ask.
Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.