rust-html2text icon indicating copy to clipboard operation
rust-html2text copied to clipboard

Potential feature: Support rowspan in table rendering

Open mindok opened this issue 6 months ago • 3 comments

Nice library thanks!

Have you thought about implementing "rowspan" in tables?

Example:

<table>
   <tr>
     <td>Top left</td>
     <td rowspan="2">Whole right</td>
  </tr>
   <tr>
     <td>Bottom left</td>
  </tr>
<table>

I would imagine RenderTable gets quite a bit more complicated to handle rowspan & colspan.

mindok avatar Jun 25 '25 06:06 mindok

This is just something that hasn't come up - I'll have a look when I get a bit of time. Out of interest do you have a particular need or use case, or is this more of a nice to have?

jugglerchris avatar Jun 26 '25 21:06 jugglerchris

Hi @jugglerchris,

I am looking at a doc pre-processing pipeline to feed LLMs to extract structured data.

The source DOCX have work procedures in them, which are typically laid out in nested tables with merged cells to support the grouping and layout.

The original plan was DOCX -> Markdown via pandoc, but nested tables break badly as Markdown (take your pick of flavour) doesn't really support them.

So I looked at DOCX -> HTML via pandoc (tables come through very nicely) and was then looking at converting this to text to make it easier for the LLM to digest. Rowspan has semantic meaning in terms of grouping information to be extracted so I wanted to preserve this if possible.

At the moment I'm investigating feeding the LLM the HTML directly. It has some advantages over text (e.g. tagging each node with an id so you can get the LLM to definitively tell you how it made stuff up), but without a lot of post processing (stripping styling, combining spans/strongs etc) the display for user review can be pretty nasty compared with the plain-text, fixed width output generated from html2text.

TLDR; No particular need with the current alternative approach, but hopefully the original use case makes sense. If it was available I'd definitely look at improving my UI!

BTW - I came across your library through @fuelen's Elixir bindings at https://github.com/fuelen/html2text

mindok avatar Jun 27 '25 00:06 mindok

I believe I've now got rowspan working reasonably in #239 . There will likely be some strange edge cases that don't work well, but a few slightly awkward examples work as I'd expect!

jugglerchris avatar Sep 07 '25 08:09 jugglerchris