Support non-standard headings for word
Requested feature
For documents created in a non-english version of Word, the headings style name will differ from Heading. I.e. in the case of German this is the default: Überschrift 1. I understand that it is not feasible to support all different versions of Word. Hence, it would make sense to allow users to share a config with the MsWordDocumentBackend.
Hi @Manuel030, yes, this is indeed a problem with MS Office formats we are aware of. Let us have an iteration on this topic to see if we can find a scalable solution where users are not required to provide extra configuration.
@Manuel030 @Manuel030 I'm looking for language agnostic solution
@Manuel030, any chance you could share with us an example document with header created in MS Word with German localization (i.e. with Überschrift 1 style instead of Heading 1) Would help to debug such cases, thx!
@Manuel030, @cau-git, I rewired label detection logic to use style_id instead of style name, this should make it MS Word localization agnostic: https://github.com/DS4SD/docling/pull/534
Hello My Italian documents loose headings and boldness when I try to exports from docx to markdown. Same documents in PDF format was converted correctly. Why this happening and how can I fix It?
Thank you