node-html-parser
node-html-parser copied to clipboard
A very fast HTML parser, generating a simplified DOM, with basic element query support.
``` test ``` Parse the above and then retrieve the document text from outerHTML. The `` will be dropped, making the end `` tag immediately follow ``: ```
For example, ``` testtest ``` The extra white-spaces between `html` and `lang="en"` and the extra newlines inside tags aren't removed, even if I use `removeWhitespace`. All regular browser implementations remove...
Two parsing options are missing. That commit adds them to the README file.
simple code hangs causing kube to kill pod: ``` import {parse} from 'node-html-parser'; const html = // load https://www.a1supplements.com/ const root = parse(html); ``` I use: ``` "node-html-parser": "^6.1.10", ```...
Would it be possible to have replaceWith method added to next nodes. Example use case: Text node has content 'word1 word2 word3' and I would like to replace it with...
Enquiry
Can this library be used to remove all HTML tags leaving only the text alone? If yes, how?
I have the following code: ```js const elements = htmlOrXml.map(s => parse(s.trim())); console.log(elements[5].structure, `'${htmlOrXml[5]}'`); ``` Outputing this: ``` null button#button.style-scope.yt-icon-button yt-icon#guide-icon.style-scope.ytd-masthead yt-icon-shape.style-scope.yt-icon icon-shape.yt-spec-icon-shape div svg path ' ' ``` Is...
I'm on Node v21.5.0 and version 6.1.12 of the library. I have a pretty simple test case that I believe should work, judging from the issue about adding comment support...
Not familiar w/ this repo, but _I think_ `rawTagName` simply needs to be added [here](https://github.com/taoqf/node-html-parser/blob/main/src/nodes/node.ts), as `string | undefined`?
I know it's hard to predict every malformed HTML possibilities, but I came across this while scraping a website. The misplaced apostrophe before the `>` of the `` makes the...