dify Doc Extractor does not support the parsing of markdown format files

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.10.1

Cloud or Self Hosted

Self Hosted (Docker), Self Hosted (Source)

Steps to reproduce

upload markdown file named test1.md
talk to the bot

✔️ Expected Behavior

Properly parse markdown files.

❌ Actual Behavior

Return Run failed: Unsupported MIME type: application/octet-stream

Oct 24 '24 03:10 yaoice

The issue you're encountering with the Doc Extractor not supporting markdown files seems to be related to the MIME type being incorrectly identified as application/octet-stream. The Doc Extractor does have specific handling for markdown files when the MIME type is correctly identified as text/markdown ^[1].

To resolve this, you might want to ensure that the file upload process correctly identifies the MIME type of markdown files. This could involve checking the file upload configuration or the environment where the file is being uploaded to ensure that markdown files are recognized with the correct MIME type. If the MIME type is not being set correctly, it might default to application/octet-stream, which is not supported by the Doc Extractor.

To continue talking to Dosu, mention @dosu.

Oct 24 '24 03:10 dosubot[bot]

It seems working for me on Cloud version.

Oct 24 '24 04:10 crazywoola

It seems working for me on Cloud version.

@crazywoola I use the docker-compose version and the source code version for testing. Can you try it?

Oct 24 '24 05:10 yaoice

Can you try https://github.com/langgenius/dify/pull/9543 this pr? If you have further questions you can DM my wechat crazyphage.

Oct 24 '24 06:10 crazywoola