css-inline icon indicating copy to clipboard operation
css-inline copied to clipboard

Css inlining creates invalid XML with single tags

Open matthiaskoenig opened this issue 4 years ago • 2 comments

The CSS inlining is removing the closing characters of single tags. E.g.

  • <hr /> is replaced with <hr>
  • <img ... /> is replaced with <img ...>

This makes the library unusable in contexts were valid XML is required. There is no need to remove the closing / and is most likely a bug in the transformation.

Thanks for the great library. This would solve my use cases if it would keep the HTML valid XML.

matthiaskoenig avatar Sep 11 '21 22:09 matthiaskoenig

One can use lxml to fix the generated issues as a workaround

       # css inline (as a side effect removes closing parts of empty tags)
        html_inline = css_inline.inline(html, extra_css=css)
        from lxml import html, etree

        # closing single tags again
        doc = html.fromstring(html_inline)
        doc_bytes: bytes = etree.tostring(doc)
        html_inline = doc_bytes.decode(encoding="utf-8")

matthiaskoenig avatar Sep 12 '21 00:09 matthiaskoenig

Hi, sorry for my late reply.

Indeed, the underlying machinery does not support XML at the moment, but there is a workaround we can use to emit proper XML, so maybe we can implement this properly. Not sure about the API though - maybe a separate argument to inline.

There is no need to remove the closing / and is most likely a bug in the transformation.

Indeed, html5ever doesn't preserve the closing /, which I assume is OK for HTML 5 serializer, but indeed <hr /> would be still a valid HTML but also a valid XHTML. It would be nice at least to have a config option for this there.

Stranger6667 avatar Jan 17 '22 11:01 Stranger6667

I added a note on this behavior to the README file and going to close this issue as I don't think there are any other action items.

Stranger6667 avatar Nov 07 '22 10:11 Stranger6667