jsoup icon indicating copy to clipboard operation
jsoup copied to clipboard

Allow configuration of self closing for tags

Open Artur- opened this issue 10 years ago • 8 comments

When using web components / custom elements the tag will always be marked as selfClosing=false and I can see no way of changing this

Artur- avatar Dec 04 '14 15:12 Artur-

Not developing. Just for help.

I believe it's mentioned here: http://jsoup.org/news/release-1.8.1

"Introduced the ability to chose between HTML and XML output, and made HTML the default. This means img tags are output as , not . XML is the default when using the XmlTreeBuilder. Control this with the Document.OutputSettings.syntax() method."

Freddy12 avatar Dec 09 '14 21:12 Freddy12

The parser already marks unknown (i.e. custom) tags as selfClosing if they are encountered as such in the input, but there is no such support for tags created programatically using Tag.valueOf. This means that as a workaround, you could use something like Jsoup.parse("<my-tag />").body().child(0).tag() instead of Tag.valueOf("my-tag"), although that solution is of course quite horrible from a performance point of view.

I'm not sure whether it would make sense to just add an overload of valueOf that also takes a boolean for controlling the selfClosing field, or if it would be wiser to introduce a TagBuilder that would allow customising all aspects of the configurations affecting how the tag is output.

Legioth avatar Apr 19 '15 08:04 Legioth

On the other hand, this might not make any sense at all since "html5" has no concept of self-closing tags as explained in e.g. http://stackoverflow.com/a/3558200 other than for "foreign" elements, i.e. MathML and SVG elements. The Custom Elements specification doesn't define such elements as "foreign" nor does it define any new syntax that would support self-closing tags.

Legioth avatar Apr 19 '15 16:04 Legioth

Jsoup supports XML parsing and generation, not just HTML. Support for non-HTML5 self-closing tags is mandatory for XML generation. It would be much nicer to be able to either pass a flag to Tag#valueOf or modify the self-closing attribute. I use the following utility method for creating self-closing elements in XML:

public static Element createVoidElement(String tagName, String baseUri) {

        Document document = Jsoup.parse("<" + tagName + "/>", baseUri, Parser.xmlParser());
        document.outputSettings().prettyPrint(false);
        document.outputSettings().escapeMode(Entities.EscapeMode.xhtml);

        return document.body() == null ? document.child(0) : document.body().child(0);
    }

As described above, the parsed Element will have the self-closing (internal) flag set to true and will render as expected.

gitastrophe avatar Mar 24 '16 04:03 gitastrophe

The problem allowing for change the self-close property is that a developer can set it as true and the element having children, which would result to an incorrect xml. I think the model is fine, just add some static methods for allowing us to register our tags, because no body has dinamic tags, we all has specific tags for our project, that can be defined in developing time. If the needs increases, the tags increases or changes, and that's it.

Nopalin avatar Dec 08 '16 06:12 Nopalin

Is there a way i can create a tag <my-tag/> like this using jsoup?

soorapadman avatar Apr 05 '17 14:04 soorapadman

For who needs to convert self closing tags to non-self closing tags, there is a tricky solution using PowerMock

 Document document = Jsoup.parse(...);
document.traverse(new NodeVisitor() {
            @Override
            public void head(Node node, int depth) {
                if (node instanceof Element) {
                    Element el = (Element) node;
                    Whitebox.setInternalState(el.tag(), "selfClosing", false);
                }
            }

            @Override
            public void tail(Node node, int depth) {
            }
        });

nicoloboschi avatar May 13 '21 06:05 nicoloboschi

To transform

<a>
  <b>
</a>

into

<a>
  <b />
</a>

you should enable xml mode to generate well-formed html:

document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);

See Element#outerHtmlHead

alexis779 avatar Oct 28 '21 05:10 alexis779