turndown-plugin-gfm
turndown-plugin-gfm copied to clipboard
Handle tables with no headers by creating an empty Markdown header
The current behaviour of the table plugin is to return the full HTML if the table has no header. However this is an issue if the user wants actual Markdown regardless of the table layout, especially when the original HTML has many classes or attributes, which makes the text unreadable.
In this pull request, if a table with no header is detected, an empty header is added instead, so it would turn something like this:
<table>
<tr><td>Row 1 Cell 1</td><td>Row 1 Cell 2</td></tr>
<tr><td>Row 2 Cell 1</td><td>Row 2 Cell 2</td></tr>
</table>
Into this:
| | |
| --- | --- |
| Row 1 Cell 1 | Row 1 Cell 2 |
| Row 2 Cell 1 | Row 2 Cell 2 |
Which will render "correctly" to:
| Row 1 Cell 1 | Row 1 Cell 2 |
| Row 2 Cell 1 | Row 2 Cell 2 |
Although the empty header is not ideal it still is valid Markdown.
Hi Laurent. Thanks for your contribution. Interesting idea! I am a little wary about merging changes that add markup once the resulting markdown is converted to HTML. This change results in nicer markdown, but potentially awkward HTML once converted, so I am unsure about it at present. I will have a think about it. Thanks again.
No problem, converting from HTML to Markdown is tricky and what could be considered "correct" might be application-dependent. In my case, I want as little HTML as possible in the resulting markdown (except for BR which are needed to add newlines in tables) so I'm fine if HTML is added once converted back.
As I use your lib quite heavily for the Joplin web clipper I've ended creating a fork of it that I use only for this app here and here. It basically allows me to quickly add changes that might not be acceptable for a general-purpose lib. There are also a few bug fixes so if there's anything that could be of use feel free to cherry pick from it.
And thanks for this very useful package! The architecture is much better than what I had done before and it makes it easy to add new features.
Just to weigh in here, I prefer the behavior of turndown to keep gfm-invalid tables (without headers) as HTML tables. It is what GitHub does in this implemenation and results in quite clear behavior. If this were merged, I would prefer a way to disable it.
any update on this issue?
FYI I am using @laurent22 's fork and seems to work fine for tables 👌