Support <strong> tag in HTML
Requested feature
Right now, if an HTML page contains a pseudo-header which is wrapped with a <strong> tag, docling skips it. An example page which contains such a tag can be found here.
I think that ideally it would be including it as bold text.
Alternatives
...
Thanks @remod to submit this issue.
Formatted text in HTML is indeed skipped unless it is part of a paragraph or another supported tag.
This will be addressed soon together with other formatting styles, once the data schema in docling-core supports it. There is a draft in progress (https://github.com/DS4SD/docling-core/pull/182)
@ceberam Thanks for circling back, that's great to hear! And thank you for the great tool!
@remod please note that we still have this request on focus. There is a similar PR that should be finalized and merged soon,
https://github.com/docling-project/docling/pull/1411. After that, we will ensure that HTML formatting like italic or bold are propagated to the DoclingDocument (check Formatting(BaseModel) for the supported styles).