dify icon indicating copy to clipboard operation
dify copied to clipboard

Doc Extractor only extracts first sheet from XLS

Open infinitnet opened this issue 1 year ago • 1 comments

Self Checks

  • [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [X] I have searched for existing issues search for existing issues, including closed ones.
  • [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.10.0

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Upload XLS file with multiple sheet tabs in it and run Doc Extactor.

✔️ Expected Behavior

Doc Extractor should extract all sheets (tabs) within the XLS.

❌ Actual Behavior

Doc Extractor only extracts the first sheet (tab) in the XLS file.

infinitnet avatar Oct 22 '24 10:10 infinitnet

The issue you're experiencing with the Doc Extractor only extracting the first sheet from an XLS file might be related to how the extraction process is implemented. The current implementation of the Doc Extractor uses the pandas library with the xlrd engine to read XLS files. It iterates over each sheet in the file, so it should theoretically handle multiple sheets [1].

However, if only the first sheet is being extracted, it could be due to a bug or a specific condition in your setup. There is error handling in place that raises a TextExtractionError if an exception occurs during the extraction process, which might provide insights if there are any underlying issues [2].

To further investigate, you might want to check the logs for any errors or exceptions that occur during the extraction process. Additionally, reviewing the code to ensure that the iteration over sheets is correctly implemented and not inadvertently skipped could be helpful.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 22 '24 10:10 dosubot[bot]

image only 1 sheet extracted

yujchn avatar Nov 29 '24 09:11 yujchn

@takatost, the user yujchn has indicated that only one sheet is being extracted from the XLS file, which is still a relevant issue. Could you please assist them with this?

dosubot[bot] avatar Nov 29 '24 09:11 dosubot[bot]