dify icon indicating copy to clipboard operation
dify copied to clipboard

Dataset: Failed to upload xlsx file

Open bowenliang123 opened this issue 1 year ago • 1 comments

Self Checks

Dify version

0.5.0

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

When uploading a xlsx to a dataset, the request failed with 500 internal error.

image image

logs from API with docker compose:

api_1          | [2024-01-24 08:31:07.160][ERROR][app.py][1414] Exception on /console/api/datasets/indexing-estimate [POST]
api_1          | Traceback (most recent call last):
api_1          |   File "/usr/local/lib/python3.10/site-packages/openpyxl/descriptors/base.py", line 59, in _convert
api_1          |     value = expected_type(value)
api_1          | TypeError: Fill() takes no arguments
api_1          | 
api_1          | During handling of the above exception, another exception occurred:
api_1          | 
api_1          | Traceback (most recent call last):
api_1          |   File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1484, in full_dispatch_request
api_1          |     rv = self.dispatch_request()
api_1          |   File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1469, in dispatch_request
api_1          |     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
api_1          |   File "/usr/local/lib/python3.10/site-packages/flask_restful/__init__.py", line 467, in wrapper
api_1          |     resp = resource(*args, **kwargs)
api_1          |   File "/usr/local/lib/python3.10/site-packages/flask/views.py", line 109, in view
api_1          |     return current_app.ensure_sync(self.dispatch_request)(**kwargs)
api_1          |   File "/usr/local/lib/python3.10/site-packages/flask_restful/__init__.py", line 582, in dispatch_request
api_1          |     resp = meth(*args, **kwargs)
api_1          |   File "/app/api/controllers/console/setup.py", line 77, in decorated
api_1          |     return view(*args, **kwargs)
api_1          |   File "/app/api/libs/login.py", line 90, in decorated_view
api_1          |     return current_app.ensure_sync(func)(*args, **kwargs)
api_1          |   File "/app/api/controllers/console/wraps.py", line 19, in decorated
api_1          |     return view(*args, **kwargs)
api_1          |   File "/app/api/controllers/console/datasets/datasets.py", line 284, in post
api_1          |     response = indexing_runner.file_indexing_estimate(current_user.current_tenant_id, file_details,
api_1          |   File "/app/api/core/indexing_runner.py", line 287, in file_indexing_estimate
api_1          |     text_docs = FileExtractor.load(file_detail, is_automatic=processing_rule.mode == 'automatic')
api_1          |   File "/app/api/core/data_loader/file_extractor.py", line 36, in load
api_1          |     return cls.load_from_file(file_path, return_text, upload_file, is_automatic)
api_1          |   File "/app/api/core/data_loader/file_extractor.py", line 106, in load_from_file
api_1          |     return delimiter.join([document.page_content for document in loader.load()]) if return_text else loader.load()
api_1          |   File "/app/api/core/data_loader/loader/excel.py", line 30, in load
api_1          |     wb = load_workbook(filename=self._file_path, read_only=True)
api_1          |   File "/usr/local/lib/python3.10/site-packages/openpyxl/reader/excel.py", line 346, in load_workbook
api_1          |     reader.read()
api_1          |   File "/usr/local/lib/python3.10/site-packages/openpyxl/reader/excel.py", line 299, in read
api_1          |     apply_stylesheet(self.archive, self.wb)
api_1          |   File "/usr/local/lib/python3.10/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet
api_1          |     stylesheet = Stylesheet.from_tree(node)
api_1          |   File "/usr/local/lib/python3.10/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree
api_1          |     return super(Stylesheet, cls).from_tree(node)
api_1          |   File "/usr/local/lib/python3.10/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
api_1          |     return cls(**attrib)
api_1          |   File "/usr/local/lib/python3.10/site-packages/openpyxl/styles/stylesheet.py", line 74, in __init__
api_1          |     self.fills = fills
api_1          |   File "/usr/local/lib/python3.10/site-packages/openpyxl/descriptors/sequence.py", line 27, in __set__
api_1          |     seq = self.container(_convert(self.expected_type, value) for value in seq)
api_1          |   File "/usr/local/lib/python3.10/site-packages/openpyxl/descriptors/sequence.py", line 27, in <genexpr>
api_1          |     seq = self.container(_convert(self.expected_type, value) for value in seq)
api_1          |   File "/usr/local/lib/python3.10/site-packages/openpyxl/descriptors/base.py", line 61, in _convert
api_1          |     raise TypeError('expected ' + str(expected_type))
api_1          | TypeError: expected <class 'openpyxl.styles.fills.Fill'>

and

api_1          | [2024-01-24 09:09:03.945][ERROR][app.py][1414] Exception on /console/api/datasets/indexing-estimate [POST]
api_1          | Traceback (most recent call last):
api_1          |   File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1484, in full_dispatch_request
api_1          |     rv = self.dispatch_request()
api_1          |   File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1469, in dispatch_request
api_1          |     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
api_1          |   File "/usr/local/lib/python3.10/site-packages/flask_restful/__init__.py", line 467, in wrapper
api_1          |     resp = resource(*args, **kwargs)
api_1          |   File "/usr/local/lib/python3.10/site-packages/flask/views.py", line 109, in view
api_1          |     return current_app.ensure_sync(self.dispatch_request)(**kwargs)
api_1          |   File "/usr/local/lib/python3.10/site-packages/flask_restful/__init__.py", line 582, in dispatch_request
api_1          |     resp = meth(*args, **kwargs)
api_1          |   File "/app/api/controllers/console/setup.py", line 77, in decorated
api_1          |     return view(*args, **kwargs)
api_1          |   File "/app/api/libs/login.py", line 90, in decorated_view
api_1          |     return current_app.ensure_sync(func)(*args, **kwargs)
api_1          |   File "/app/api/controllers/console/wraps.py", line 19, in decorated
api_1          |     return view(*args, **kwargs)
api_1          |   File "/app/api/controllers/console/datasets/datasets.py", line 284, in post
api_1          |     response = indexing_runner.file_indexing_estimate(current_user.current_tenant_id, file_details,
api_1          |   File "/app/api/core/indexing_runner.py", line 287, in file_indexing_estimate
api_1          |     text_docs = FileExtractor.load(file_detail, is_automatic=processing_rule.mode == 'automatic')
api_1          |   File "/app/api/core/data_loader/file_extractor.py", line 36, in load
api_1          |     return cls.load_from_file(file_path, return_text, upload_file, is_automatic)
api_1          |   File "/app/api/core/data_loader/file_extractor.py", line 106, in load_from_file
api_1          |     return delimiter.join([document.page_content for document in loader.load()]) if return_text else loader.load()
api_1          |   File "/app/api/core/data_loader/loader/csv_loader.py", line 31, in load
api_1          |     docs = self._read_from_file(csvfile)
api_1          |   File "/app/api/core/data_loader/loader/csv_loader.py", line 52, in _read_from_file
api_1          |     content = "\n".join(f"{k.strip()}: {v.strip()}" for k, v in row.items())
api_1          |   File "/app/api/core/data_loader/loader/csv_loader.py", line 52, in <genexpr>
api_1          |     content = "\n".join(f"{k.strip()}: {v.strip()}" for k, v in row.items())
api_1          | AttributeError: 'NoneType' object has no attribute 'strip'

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

bowenliang123 avatar Jan 24 '24 08:01 bowenliang123

🤖

Based on the error logs you've provided, it seems like the issue is related to the openpyxl library, which is used for reading and writing Excel files. The error message TypeError: Fill() takes no arguments suggests that there might be an issue with the way the library is handling the formatting or styling of your Excel file.

This could be due to a variety of reasons, such as a specific feature or formatting in your Excel file that is not supported by the version of openpyxl that Dify is using. It could also be a bug in the openpyxl library itself.

As a workaround, you could try to remove any complex formatting or styles from your Excel file and try uploading it again. If the issue persists, it might be helpful to provide a sample Excel file that can reproduce the issue. This would help in identifying the exact cause of the problem.

For further investigation, you could also check the version of openpyxl that Dify is using and see if there are any known issues or updates related to this error. You can do this by running pip show openpyxl in your Dify environment.

Please note that this is just a potential cause of the issue based on the provided error logs. There might be other factors at play here, and without more information, it's hard to pinpoint the exact cause. If you could provide more details about the Excel file you're trying to upload (e.g., its size, the number of rows/columns, the type of data it contains, etc.), it would be easier to diagnose the issue.

For more information about the openpyxl library and its usage, you can refer to its official documentation.

I hope this helps! If you have any more questions or need further clarification, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Jan 24 '24 08:01 dosubot[bot]