docling icon indicating copy to clipboard operation
docling copied to clipboard

Support <strong> tag in HTML

Open remod opened this issue 10 months ago • 3 comments

Requested feature

Right now, if an HTML page contains a pseudo-header which is wrapped with a <strong> tag, docling skips it. An example page which contains such a tag can be found here.

I think that ideally it would be including it as bold text.

Alternatives

...

remod avatar Mar 09 '25 22:03 remod

Thanks @remod to submit this issue. Formatted text in HTML is indeed skipped unless it is part of a paragraph or another supported tag. This will be addressed soon together with other formatting styles, once the data schema in docling-core supports it. There is a draft in progress (https://github.com/DS4SD/docling-core/pull/182)

ceberam avatar Mar 10 '25 10:03 ceberam

@ceberam Thanks for circling back, that's great to hear! And thank you for the great tool!

remod avatar Mar 10 '25 10:03 remod

@remod please note that we still have this request on focus. There is a similar PR that should be finalized and merged soon, https://github.com/docling-project/docling/pull/1411. After that, we will ensure that HTML formatting like italic or bold are propagated to the DoclingDocument (check Formatting(BaseModel) for the supported styles).

ceberam avatar May 26 '25 06:05 ceberam