When processing Excel with the document extractor, there are vertical bars ("|") in the cells. The resulting markdown (md) format becomes messy
Self Checks
- [x] I have read the Contributing Guide and Language Policy.
- [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report, otherwise it will be closed.
- [x] 【中文用户 & Non English User】请使用英语提交,否则会被关闭 :)
- [x] Please do not modify this template :) and fill in all the required fields.
Dify version
1.9.1
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
When processing Excel with the document extractor, there are vertical bars ("|") in the cells. The resulting markdown (md) format becomes messy
✔️ Expected Behavior
When processing Excel with the document extractor, there are vertical bars ("|") in the cells. The resulting markdown (md) format becomes messy, which affects the subsequent processing by large models. Can we process it into CSV format instead, or escape the vertical bars as "|"?
❌ Actual Behavior
The resulting markdown (md) format becomes messy
The root cause is that vertical bars (|) in cell content are not escaped during markdown table generation, which breaks markdown formatting. There is currently no built-in support in Dify's extractors to escape these bars or to export Excel data as CSV instead of markdown. The Excel extractor outputs semicolon-separated key-value pairs, while other extractors (like for Word or Notion) generate markdown tables but also do not escape vertical bars, leading to the same issue if cell content contains | reference.
As a workaround, you can preprocess your Excel data to replace | with \| before uploading, or post-process the markdown output to escape problematic characters. To fix this at the source, changes would be needed in the backend table-to-markdown logic to escape vertical bars in cell content before joining them into markdown tables.
If you need CSV output, you would need to implement or request a feature to support exporting Excel data as CSV during extraction, as this is not currently available.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other