html-to-markdown
html-to-markdown copied to clipboard
[Feature]: alternative table syntax for LLMs
Describe the improvement
Hi, I see you are working right now on supporting tables. That is awesome. I actually need this right now for tables where the cells contain simple makeup and all rows have the same number of cells, with no spans, etc. Is it possible to use it for that yet? Many thanks
I hacked it into the CLI flags like the strike through one, it seems ok so far
Wow you noticed it fast 👍
Just added --plugin-table to the CLI on the main branch. Please let me know how it works!
@sparkprime The tables are now live with 2.3.0 🥳
I added one feature which is to disable the padding for column lengths for both spaces and the - below the heading. My reasoning is that LLMs don't really understand the width of tokens and therefore would not "see" alignment. So this might just be wasted tokens.
I would submit as a PR but I'd have to work through my organization's process for that and it may not be worth it for such a simple change. But I thought you might be interested to hear that I did it.
@sparkprime Oh that is really interesting!
| | |
|-----------|------|
| Something | Text |
| B1 | B2 |
So you mean instead of the above it is either:
a)
| | |
|---------|----|
|Something|Text|
|B1 |B2 |
b)
| | |
|---------|----|
|Something|Text|
|B1|B2|
Yes except also with the dashes under the header, there are always 3
On Thu, 27 Feb 2025, 18:04 Johannes Kaufmann, @.***> wrote:
@sparkprime https://github.com/sparkprime Oh that is really interesting!
Something Text B1 B2 So you mean instead of the above it is either:
a)
Something Text B1 B2 b)
Something Text B1 B2 — Reply to this email directly, view it on GitHub https://github.com/JohannesKaufmann/html-to-markdown/issues/145#issuecomment-2688717211, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABJBXTPZ35BN3KDQOFRQ7D2R5HT7AVCNFSM6AAAAABX2P5MTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBYG4YTOMRRGE . You are receiving this because you were mentioned.Message ID: @.***> [image: JohannesKaufmann]JohannesKaufmann left a comment (JohannesKaufmann/html-to-markdown#145) https://github.com/JohannesKaufmann/html-to-markdown/issues/145#issuecomment-2688717211
@sparkprime https://github.com/sparkprime Oh that is really interesting!
Something Text B1 B2 So you mean instead of the above it is either:
a)
Something Text B1 B2 b)
Something Text B1 B2 — Reply to this email directly, view it on GitHub https://github.com/JohannesKaufmann/html-to-markdown/issues/145#issuecomment-2688717211, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABJBXTPZ35BN3KDQOFRQ7D2R5HT7AVCNFSM6AAAAABX2P5MTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBYG4YTOMRRGE . You are receiving this because you were mentioned.Message ID: @.***>