node-html-to-text icon indicating copy to clipboard operation
node-html-to-text copied to clipboard

Invalid HTML should be handled as good as possible

Open spotlesscoder opened this issue 5 months ago • 3 comments

Minimal HTML example

<p>subject321</p><html><head></head><body>body123</body></html>

Options

default options

Observed output

body123

Expected output

subject321body123

Version information

  • html-to-text: 9.0.5
  • node:

spotlesscoder avatar Jul 25 '25 13:07 spotlesscoder

Context: There can be HTML tags because we only receive concatenated strings of header lines and bodys in out code already and cannot split this reliably

spotlesscoder avatar Jul 25 '25 13:07 spotlesscoder

const options = { 'baseElements': { 'selectors': [] } };
const text = htmlToText(html, options);

will result in

subject321

body123

baseElements.returnDomByDefault is set to true by default, but in order for it to always use entire DOM, baseElements.selectors has to never match. By default, it is set to ['body'] which matches a part of your input.

https://github.com/html-to-text/node-html-to-text/blob/master/packages/html-to-text/README.md#options

KillyMXI avatar Jul 25 '25 13:07 KillyMXI

Oh that's really good to know. I think this should be stated more prominently in the Readme 👍

spotlesscoder avatar Jul 25 '25 14:07 spotlesscoder