cheerio icon indicating copy to clipboard operation
cheerio copied to clipboard

Breaking change in 1.0.0: htmlparser2 mode self-closes empty tags

Open nwalters512 opened this issue 1 year ago • 1 comments

Reproduction: https://github.com/nwalters512/cheerio-self-closing-repro

Code for reference:

import * as oldCheerio from 'cheerio-rc/lib/slim';
import * as newCheerio from 'cheerio-1/slim';

const HTML = '<html><head></head><body><div></div></body></html>';

console.log(oldCheerio.load(HTML).html());
console.log(newCheerio.load(HTML).html());

console.log(oldCheerio.load(HTML, { recognizeSelfClosing: true }).html());
console.log(newCheerio.load(HTML, { xml: { recognizeSelfClosing: true } }).html());

Steps to reproduce:

  • Clone the repository.
  • Run yarn.
  • Run node index.js.

Observe the following output is printed:

<html><head></head><body><div></div></body></html>
<html><head></head><body><div></div></body></html>
<html><head></head><body><div></div></body></html>
<html><head/><body><div/></body></html>

Specifically, note that <head> and <div> were serialized as self-closing tags.

I'm not sure if this should be considered a bug or not, but it appears to be a breaking change and it isn't called out anywhere in the release notes or upgrade guide: https://cheerio.js.org/blog/cheerio-1.0

nwalters512 avatar Aug 17 '24 18:08 nwalters512

It seems that things work as expected if I change the last line to the following (adding xmlMode: false:

console.log(newCheerio.load(HTML, { xml: { recognizeSelfClosing: true, xmlMode: false } }).html());

This doesn't make sense given the configuration documentation (https://cheerio.js.org/docs/advanced/configuring-cheerio#using-htmlparser2-for-html) which states:

You can also use Cheerio's slim export, which always uses htmlparser2. This avoids loading parse5, which saves some bytes eg. in browser environments:

That is, I would expect to not have to set xmlMode: false when using the "slim" export. Do I in fact have to set xmlMode: false even in that case?

nwalters512 avatar Aug 17 '24 18:08 nwalters512

Do I in fact have to set xmlMode: false even in that case?

You don't!

fb55 avatar Jun 08 '25 19:06 fb55

Sorry, the resolution here isn't clear to me. Is this a bug, an intentional change, user error, or am I just misunderstanding something?

nwalters512 avatar Jun 08 '25 20:06 nwalters512

Intentional; if you parse XML, you will get self-closing tags. The slim export can work around this, as can disabling xml mode.

fb55 avatar Jun 09 '25 08:06 fb55

Ah, does the mere presence of the xml key in the reproducing example mean parse as XML? I don't believe that was the case in pre-v1, as my example shows, and that wasn't called out in the changelog. However, I see in the docs that xmlMode does indeed default to true when the xml key is present.

nwalters512 avatar Jun 09 '25 15:06 nwalters512