docx2python icon indicating copy to clipboard operation
docx2python copied to clipboard

Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.

Results 4 docx2python issues
Sort by recently updated
recently updated
newest added

I want to extract the table in .docx file into markdown format, while maintaining the position of the table in the document. So I can't use `python-docx` `document.paragraghs` and `document.tables`...

updates: - [github.com/pre-commit/pre-commit-hooks: v4.5.0 → v4.6.0](https://github.com/pre-commit/pre-commit-hooks/compare/v4.5.0...v4.6.0) - [github.com/psf/black: 24.3.0 → 24.4.0](https://github.com/psf/black/compare/24.3.0...24.4.0) - https://github.com/charliermarsh/ruff-pre-commit → https://github.com/astral-sh/ruff-pre-commit - [github.com/astral-sh/ruff-pre-commit: v0.3.5 → v0.4.1](https://github.com/astral-sh/ruff-pre-commit/compare/v0.3.5...v0.4.1) - [github.com/RobertCraigie/pyright-python: v1.1.356 → v1.1.359](https://github.com/RobertCraigie/pyright-python/compare/v1.1.356...v1.1.359)

when i use from docx2python import docx2python doc_result = docx2python(file_path) doc_result.body[2][0] and the result like this: [[14)\t\t\t其他', '', '\t1)\t\t\t税费. 除非双方另有明确约定,由任何机关/个人征收的与该商铺使用有关的任何种类的税金、行政规费、收费、费用等应由甲方承担,除非适用法律另有规定。', '',]], but the real content is this: 14. 其他 14.1...

It would be great if the strict format was supported for docx/docm files. I think it basically just requires different ns tags to be used. Here are the tags used...