Support export of DoclingDocument to HTML
Requested feature
Docling can currently export to JSON, Markdown and Doctags. Exporting to plain HTML would be a useful addition, because it renders nicely on any browser and correctly displays table structures with spans. This feature must be implemented in docling-core as a method DoclingDocument.export_to_html.
Hi @cau-git, Please assign this issue to me, I'm interested in working on it.
@taufikus Thanks for your interest in contributing to this issue! You are very welcome to create a proposal and submit a PR for our review.
Please note that since this issue is touching a core component of Docling, the code must strongly adhere to the contribution guidelines and needs decent test coverage. Hence I am summarizing a few recommendations below:
- Please take the
export_to_markdownmethod as a blueprint in terms of arguments and general code structure. - Place type hints on method signatures and inside the code
- Make sure to install the pre-commit hooks (
poetry run pre-commit install) before you commit, such that any commits are validated with the toolchain. - Add test units in
docling_corehere to specifically test the features of your HTML export method
I am assigning you to this issue, please let me know if you want to proceed.
@taufikus Do you have an update on this?
@PeterStaar-IBM sorry for the delaying this task, I've been working on the issue and have made little progress, but it's turning out to be more complex than initially anticipated. i guess i need more time on this. if someone else wants to take onto this then you can assign it to them as well. till then i will keep trying to do it and will try to come up with the solution.