cheerio icon indicating copy to clipboard operation
cheerio copied to clipboard

Weird behaviour with HTML entities within XML content

Open Ponynjaa opened this issue 4 months ago • 0 comments

When I have this input: <root><table><div>test&nbsp;</div></table></root> and I run this code:

import * as cheerio from 'cheerio';

const input = `<root><table><div>test&nbsp;</div></table></root>`;
const $ = cheerio.load(input, {
	xmlMode: false,
	decodeEntities: false
}, false);
console.log($.xml()); // -> "<root><div>test </div><table/></root>"

it moves the table which is a behaviour I don't want, so I use xmlMode=true like so:

import * as cheerio from 'cheerio';

const input = `<root><table><div>test&nbsp;</div></table></root>`;
const $ = cheerio.load(input, {
	xmlMode: true,
	decodeEntities: false
}, false);
console.log($.xml()); // -> "<root><table><div>test&nbsp;</div></table></root>"

Now the table didn't get moved in the result but the &nbsp; doesn't get decoded to \u00a0 anymore. If I then try to use decodeEntities=true it encodes it even more:

import * as cheerio from 'cheerio';

const input = `<root><table><div>test&nbsp;</div></table></root>`;
const $ = cheerio.load(input, {
	xmlMode: true,
	decodeEntities: true
}, false);
console.log($.xml()); // -> "<root><table><div>test&amp;nbsp;</div></table></root>"

My current workaround is to use the libraries htmlparser2 and dom-serializer separately like so:

import * as htmlparser2 from 'htmlparser2';
import * as domserializer from 'dom-serializer';

const input = `<root><table><div>test&nbsp;</div></table></root>`;
const parsed = htmlparser2.parseDocument(input, {
	xmlMode: false,
	decodeEntities: true
});

const serialized = domserializer.render(parsed, {
	xmlMode: false,
	encodeEntities: false,
	decodeEntities: true
});

console.log(serialized); // -> "<root><table><div>test </div></table></root>"

It is weird behaviour and I can't really tell where the error is happening, but I suppose it's the lack of options to pass to the serializer.

Ponynjaa avatar Mar 01 '24 09:03 Ponynjaa