Breaking change in 1.0.0: htmlparser2 mode self-closes empty tags
Reproduction: https://github.com/nwalters512/cheerio-self-closing-repro
Code for reference:
import * as oldCheerio from 'cheerio-rc/lib/slim';
import * as newCheerio from 'cheerio-1/slim';
const HTML = '<html><head></head><body><div></div></body></html>';
console.log(oldCheerio.load(HTML).html());
console.log(newCheerio.load(HTML).html());
console.log(oldCheerio.load(HTML, { recognizeSelfClosing: true }).html());
console.log(newCheerio.load(HTML, { xml: { recognizeSelfClosing: true } }).html());
Steps to reproduce:
- Clone the repository.
- Run
yarn. - Run
node index.js.
Observe the following output is printed:
<html><head></head><body><div></div></body></html>
<html><head></head><body><div></div></body></html>
<html><head></head><body><div></div></body></html>
<html><head/><body><div/></body></html>
Specifically, note that <head> and <div> were serialized as self-closing tags.
I'm not sure if this should be considered a bug or not, but it appears to be a breaking change and it isn't called out anywhere in the release notes or upgrade guide: https://cheerio.js.org/blog/cheerio-1.0
It seems that things work as expected if I change the last line to the following (adding xmlMode: false:
console.log(newCheerio.load(HTML, { xml: { recognizeSelfClosing: true, xmlMode: false } }).html());
This doesn't make sense given the configuration documentation (https://cheerio.js.org/docs/advanced/configuring-cheerio#using-htmlparser2-for-html) which states:
You can also use Cheerio's slim export, which always uses
htmlparser2. This avoids loadingparse5, which saves some bytes eg. in browser environments:
That is, I would expect to not have to set xmlMode: false when using the "slim" export. Do I in fact have to set xmlMode: false even in that case?
Do I in fact have to set xmlMode: false even in that case?
You don't!
Sorry, the resolution here isn't clear to me. Is this a bug, an intentional change, user error, or am I just misunderstanding something?
Intentional; if you parse XML, you will get self-closing tags. The slim export can work around this, as can disabling xml mode.
Ah, does the mere presence of the xml key in the reproducing example mean parse as XML? I don't believe that was the case in pre-v1, as my example shows, and that wasn't called out in the changelog. However, I see in the docs that xmlMode does indeed default to true when the xml key is present.