cheerio
cheerio copied to clipboard
Weird behaviour with HTML entities within XML content
When I have this input: <root><table><div>test </div></table></root>
and I run this code:
import * as cheerio from 'cheerio';
const input = `<root><table><div>test </div></table></root>`;
const $ = cheerio.load(input, {
xmlMode: false,
decodeEntities: false
}, false);
console.log($.xml()); // -> "<root><div>test </div><table/></root>"
it moves the table which is a behaviour I don't want, so I use xmlMode=true
like so:
import * as cheerio from 'cheerio';
const input = `<root><table><div>test </div></table></root>`;
const $ = cheerio.load(input, {
xmlMode: true,
decodeEntities: false
}, false);
console.log($.xml()); // -> "<root><table><div>test </div></table></root>"
Now the table didn't get moved in the result but the
doesn't get decoded to \u00a0
anymore. If I then try to use decodeEntities=true
it encodes it even more:
import * as cheerio from 'cheerio';
const input = `<root><table><div>test </div></table></root>`;
const $ = cheerio.load(input, {
xmlMode: true,
decodeEntities: true
}, false);
console.log($.xml()); // -> "<root><table><div>test&nbsp;</div></table></root>"
My current workaround is to use the libraries htmlparser2
and dom-serializer
separately like so:
import * as htmlparser2 from 'htmlparser2';
import * as domserializer from 'dom-serializer';
const input = `<root><table><div>test </div></table></root>`;
const parsed = htmlparser2.parseDocument(input, {
xmlMode: false,
decodeEntities: true
});
const serialized = domserializer.render(parsed, {
xmlMode: false,
encodeEntities: false,
decodeEntities: true
});
console.log(serialized); // -> "<root><table><div>test </div></table></root>"
It is weird behaviour and I can't really tell where the error is happening, but I suppose it's the lack of options to pass to the serializer.