markitdown icon indicating copy to clipboard operation
markitdown copied to clipboard

merged cell convert issue,both in excel and pptx

Open yutaixi opened this issue 11 months ago • 5 comments

The merged cells in Excel, after being converted to Markdown, are split, resulting in a table with new empty cells,may lead to incorrect or lost information. before convert Image after convert Image

yutaixi avatar Dec 20 '24 09:12 yutaixi

Markdown tables do not allow for merged cells. This is not related to markitdown but the markdown language itself. HTML allows for this flexibility. It will be more flexible to have a xml-it tool besides the markdown. Anyhow, I think markdown can also support custom html so perhaps the developers can eventually use the xml for tables.

GaboEspadas avatar Jan 15 '25 12:01 GaboEspadas

I also encountered similar problems when parsing tables. My work is for very large and complex tables with a lot of merged cells with a large span.

If there is no option to parse merged cells, my table is basically unreadable.

I added an optional option in pr #1165 to support parsing merged cells and headers in Excel and filling values ​​into child cells.

Combined with the expansion of parent and child items of the table that I implemented myself, this can greatly improve LLM's understanding of the table.

Image

Image

BetterAndBetterII avatar Apr 01 '25 04:04 BetterAndBetterII

@BetterAndBetterII , hey bro, can you show me, how to use fill_merged_cells args in Python code?

wei12314 avatar May 09 '25 06:05 wei12314

Same issue here. Were you guys able to figure it out?

cooleel avatar Jul 09 '25 11:07 cooleel

@wei12314 Have you tried? If yes, Do you need to custome code in some files which is committed by BetterAndBetterII in 4 commits?

phamkhactu avatar Sep 18 '25 06:09 phamkhactu