Potential feature: Support rowspan in table rendering
Nice library thanks!
Have you thought about implementing "rowspan" in tables?
Example:
<table>
<tr>
<td>Top left</td>
<td rowspan="2">Whole right</td>
</tr>
<tr>
<td>Bottom left</td>
</tr>
<table>
I would imagine RenderTable gets quite a bit more complicated to handle rowspan & colspan.
This is just something that hasn't come up - I'll have a look when I get a bit of time. Out of interest do you have a particular need or use case, or is this more of a nice to have?
Hi @jugglerchris,
I am looking at a doc pre-processing pipeline to feed LLMs to extract structured data.
The source DOCX have work procedures in them, which are typically laid out in nested tables with merged cells to support the grouping and layout.
The original plan was DOCX -> Markdown via pandoc, but nested tables break badly as Markdown (take your pick of flavour) doesn't really support them.
So I looked at DOCX -> HTML via pandoc (tables come through very nicely) and was then looking at converting this to text to make it easier for the LLM to digest. Rowspan has semantic meaning in terms of grouping information to be extracted so I wanted to preserve this if possible.
At the moment I'm investigating feeding the LLM the HTML directly. It has some advantages over text (e.g. tagging each node with an id so you can get the LLM to definitively tell you how it made stuff up), but without a lot of post processing (stripping styling, combining spans/strongs etc) the display for user review can be pretty nasty compared with the plain-text, fixed width output generated from html2text.
TLDR; No particular need with the current alternative approach, but hopefully the original use case makes sense. If it was available I'd definitely look at improving my UI!
BTW - I came across your library through @fuelen's Elixir bindings at https://github.com/fuelen/html2text
I believe I've now got rowspan working reasonably in #239 . There will likely be some strange edge cases that don't work well, but a few slightly awkward examples work as I'd expect!