Parsing self closing tag produce bad html
Self closing tag is been removed.
Example :
cheerio.load(`<div class="n-content-video n-content-video--youtube">
<iframe src="https://www.youtube.com/?rel=0"/>
</div>`).html()
This produces :
'<html><head></head><body><div class="n-content-video n-content-video--youtube"> <iframe src="https://www.youtube.com/?rel=0"> </div></iframe></div></body></html>'
As you can see closing slash from iframe has disappeared producing bad html syntax
the way to resolve this is using xmlMode
cheerio.load(`<div class="n-content-video n-content-video--youtube">
<iframe src="https://www.youtube.com/?rel=0"/>
</div>`,{recognizeSelfClosing : true}).html()
But even if we use this configuration de DOM tree is not well represented
with recognizeSelfClosing option produces bad html syntax also.
'<html><head></head><body><div class="n-content-video n-content-video--youtube">
<iframe src="https://www.youtube.com/?rel=0">
</div></iframe></div></body></html>'
The original behaviour is in line with how browsers work: Try it in yours.
The problem comes from transforming from xml to html
cheerio.load(cheerio.load(`<div class="n-content-video n-content-video--youtube">
<iframe src="https://www.youtube.com?rel=0"></iframe>
</div><div><div class="n-recommended"></div></div>`).xml()).html()
it produces
'<html><head></head><body><div class="n-content-video n-content-video--youtube">
<iframe src="https://www.youtube.com?rel=0">
</div><div><div class="n-recommended"/></div></body></html></iframe></div></body></html>'
Shouldn't be the output text the same as the input ?
When I put code (from above) into Chrome browser I got:
<html><head></head><body><div class="n-content-video n-content-video--youtube">
<iframe src="https://www.youtube.com?rel=0"></iframe>
</div><div><div class="n-recommended"></div></div>
</body></html>
so browser actually closes iframe before it's parent element div
maybe you should decode content first, so you can avoid this "repairing" functionality.
// decode self closed tags as fragment
const decodedHTML = cheerio.load(selfclosedHTML, { xmlMode: true }, false).html({ xmlMode: false });
// and now use it as regular
console.info(cheerio.load(decodedHTML).html());
result:
<html><head></head><body><div class="n-content-video n-content-video--youtube">
<iframe src="https://www.youtube.com?rel=0"></iframe>
</div><div><div class="n-recommended"></div></div></body></html>