docx2python icon indicating copy to clipboard operation
docx2python copied to clipboard

Is there any way to extract the table into markdown format?

Open ChanghaoLau opened this issue 10 months ago • 1 comments

I want to extract the table in .docx file into markdown format, while maintaining the position of the table in the document. So I can't use python-docx document.paragraghs and document.tables to handle paragraghs and tables separately (this will destory the positional relationship between them).

docx2python is very easy to use. I would like to know whether docx2python can save tables in markdown format, or whether it can separate tables, images and paragraphs in output.body. Thank you!

ChanghaoLau avatar Apr 24 '24 07:04 ChanghaoLau

I am going to leave this issue open for a bit and thing about how this might be seamlessly accomplished. Until then, here’s a script that will identify tables for you.

https://github.com/ShayHill/transpose_docx_tables

ShayHill avatar Apr 25 '24 18:04 ShayHill

As of Docx2Python v 3.0.0, tables are guaranteed to be nxm (n rows by m columns) and are straightforward to identify. See details near the top of the README file. I've also left an example of exporting tables as markdown in the tests folder. It's referenced in the README.

ShayHill avatar Jul 27 '24 22:07 ShayHill