Doc Extractor does not support the parsing of markdown format files
Self Checks
- [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [X] Please do not modify this template :) and fill in all the required fields.
Dify version
0.10.1
Cloud or Self Hosted
Self Hosted (Docker), Self Hosted (Source)
Steps to reproduce
- upload markdown file named
test1.md - talk to the bot
✔️ Expected Behavior
Properly parse markdown files.
❌ Actual Behavior
Return Run failed: Unsupported MIME type: application/octet-stream
The issue you're encountering with the Doc Extractor not supporting markdown files seems to be related to the MIME type being incorrectly identified as application/octet-stream. The Doc Extractor does have specific handling for markdown files when the MIME type is correctly identified as text/markdown [1].
To resolve this, you might want to ensure that the file upload process correctly identifies the MIME type of markdown files. This could involve checking the file upload configuration or the environment where the file is being uploaded to ensure that markdown files are recognized with the correct MIME type. If the MIME type is not being set correctly, it might default to application/octet-stream, which is not supported by the Doc Extractor.
To continue talking to Dosu, mention @dosu.
It seems working for me on Cloud version.
![]()
It seems working for me on Cloud version.
@crazywoola I use the docker-compose version and the source code version for testing. Can you try it?
Can you try https://github.com/langgenius/dify/pull/9543 this pr? If you have further questions you can DM my wechat crazyphage.
It seems working for me on Cloud version.