maven-release icon indicating copy to clipboard operation
maven-release copied to clipboard

Replace JDom with Domtrip

Open kwin opened this issue 5 months ago • 30 comments

Affected version

3.1.1

Bug description

Lots of issues are related to how rewriting POMs with JDOM works (e.g. wrong indentation). By switching from JDom to https://github.com/maveniverse/domtrip we should get a much more solid codebase.

kwin avatar Jul 27 '25 09:07 kwin

@cstamas Is DOMTrip ready for using it inside m-release?

kwin avatar Jul 27 '25 09:07 kwin

@gnodet ping, but imo yes

cstamas avatar Jul 27 '25 10:07 cstamas

@cstamas Is DOMTrip ready for using it inside m-release?

This would make Maven Release unusable with Maven 3 because it requires Java 17.

michael-o avatar Jul 28 '25 14:07 michael-o

Java 17 is fully supported in plugins/extensions since 3.9.6: https://github.com/apache/maven/issues/8732

kwin avatar Jul 28 '25 20:07 kwin

Java 17 is fully supported in plugins/extensions since 3.9.6: apache/maven#8732

Require, not support

michael-o avatar Jul 29 '25 06:07 michael-o

Java 17 is fully supported in plugins/extensions since 3.9.6: apache/maven#8732

Require, not support

One possibility would be to keep both in the class path and try to use Domtrip if possible, else default to JDom.

gnodet avatar Jul 30 '25 22:07 gnodet

I still don’t see why

This would make Maven Release unusable with Maven 3 because it requires Java 17.

As said before Java 17 plugins work fine in Maven 3.9.6. If it is only dependencies in Java 17 bytecode without Sisu logic even older ones.

kwin avatar Jul 31 '25 05:07 kwin

I still don’t see why

This would make Maven Release unusable with Maven 3 because it requires Java 17.

As said before Java 17 plugins work fine in Maven 3.9.6. If it is only dependencies in Java 17 bytecode without Sisu logic even older ones.

This I don't understand since Domtrip has a hard requirement on 17.

michael-o avatar Jul 31 '25 05:07 michael-o

Right, but the question is why you consider this incompatible with Maven 3….

kwin avatar Jul 31 '25 05:07 kwin

Right, but the question is why you consider this incompatible with Maven 3….

I didn't say that. Please re-read. Maven 3 won't run on Java 8 anymore.

michael-o avatar Jul 31 '25 06:07 michael-o

You said

This would make Maven Release unusable with Maven 3

I beg to differ

kwin avatar Jul 31 '25 06:07 kwin

You said

This would make Maven Release unusable with Maven 3

I beg to differ

This I don't neither how it would continue to run on 8 with 17 bytecode

michael-o avatar Jul 31 '25 06:07 michael-o

I'm skeptical. These sorts of projects usually miss large chunks of XML. They work well enough on some subset the inventors thought of, and fall apart pretty quickly when faced with the full panoply of what's actually possible. And in general you really don't want to mix lexical analysis and formatting with syntactic level markup. JDOM at least is unlikely to produce anything malformed.

elharo avatar Nov 04 '25 20:11 elharo

I'm skeptical. These sorts of projects usually miss large chunks of XML. They work well enough on some subset the inventors thought of, and fall apart pretty quickly when faced with the full panoply of what's actually possible. And in general you really don't want to mix lexical analysis and formatting with syntactic level markup. JDOM at least is unlikely to produce anything malformed.

DomTrip keeps the whole infoset (and more), unlike the DOM API. JDOM looses many things, one of them is the attribute order. Not speaking quotes, entities, white spaces. JDOM clearly cannot to round-tripping.

If you can point to a better library, I'm all ears...

gnodet avatar Nov 05 '25 13:11 gnodet

I agree JDOM doesn't do round tripping. Things like attribute order are not part of the infoset and should not be depended on by code. What I'm worried about is how reliable this new library is. I have found a lot of bugs in a lot of "better" XML libraries over the years. Most were deep design flaws that assumed XML was something other than it actually is. JDOM is one of a handful that I am confident can process most real world XML documents without making mistakes. (There are extreme cases no current library I know of can handle.)

What problem are we actually trying to solve here?

elharo avatar Nov 05 '25 13:11 elharo

For example #1304 or #1274

kwin avatar Nov 05 '25 13:11 kwin

I agree JDOM doesn't do round tripping. Things like attribute order are not part of the infoset and should not be depended on by code. What I'm worried about is how reliable this new library is. I have found a lot of bugs in a lot of "better" XML libraries over the years. Most were deep design flaws that assumed XML was something other than it actually is. JDOM is one of a handful that I am confident can process most real world XML documents without making mistakes. (There are extreme cases no current library I know of can handle.)

What problem are we actually trying to solve here?

Round tripping is what we're trying to solve. XML Parsers are not meant to do round tripping and fail short in a lof of use cases. A lot of information is lost by traditional parsers which aim for performances and definitely not round tripping.

DomTrip has been written to solve this very use case, and afaik, there's no other library available.

gnodet avatar Nov 05 '25 14:11 gnodet

There was one, long time ago but was abandoned (or proactively dropped/deleted by its author, unsure). It's still reachable wiki explains some of the problems decentxml and domtrip tries to solve: https://code.google.com/archive/p/decentxml/wikis/Tutorial.wiki

cstamas avatar Nov 05 '25 14:11 cstamas

Let's go deeper. What problem does round tripping solve? Why does it matter if the attributes in an element come out in a different order or the type of quote mark around an attribute value changes?

elharo avatar Nov 05 '25 15:11 elharo

Simpler and cleaner SCM diffs, in 99% of cases (release plugin, but also things like this: https://github.com/maveniverse/parent/commit/6de618dece057c1dc05abd2beab8f681c92fe332 -- this is mvn toolbox:version -Dapply and is domtrip in action). But in general, if I apply toolbox:versions and it updates my user wide extensions.xml, even if that file is unversioned, I'd hate if it would fully reformat it (or drop comments I left for myself or whatever). It should "edit" the file just as I would (user would).

cstamas avatar Nov 05 '25 15:11 cstamas

Dropping comments shouldn't be an issue. JDOM can easily handle that.

It also shouldn't fully reformat the file unless you've configured JDOM to do that. It might adjust whitespace inside tags and reorder attributes, but that's pretty minor.

elharo avatar Nov 05 '25 15:11 elharo

No, is not minor, sorry. In fact, is unacceptable, as we've been there and we moved past that.

cstamas avatar Nov 05 '25 15:11 cstamas

Plus, let me remind you that we talk about Maven XML that covers Maven 3 as well, and as you know, XML used in Maven 3 has own quirks.

Essentially, the goal is to (programatically) "edit" the XML files (POM, extensions.xml and settings.xml are currently supported by DomTrip) as a human would, and to not reorder anything, to not swallow comments, to not remove CDATA sections, to not replace whitespace (if I mixed tabs and spaces, let it be so), and so on and so on. I want to see clean diff with sensible changes only.

cstamas avatar Nov 05 '25 15:11 cstamas

Dropping comments shouldn't be an issue. JDOM can easily handle that.

XML parsers such as JDOM completely discard comments in the XML prolog (before the xml PI), as this is not part of the Infoset. This usually end up with removing the license header from pom files, which is really wrong.

gnodet avatar Nov 05 '25 15:11 gnodet

Assuming by "xml PI" you mean the XML declaration, there are no such documents. The XML 1.0 grammar does not allow comments to appear before the XML declaration. Any document that has one there is not well-formed and should be immediately rejected by the parser.

This is exactly the sort of picky detail that "better" XML libraries usually get wrong, and why I'm nervous about adopting a new one. XML is designed to be simple to write, and perhaps simple to process compared to SGML, but it's not really simple to parse or model.

elharo avatar Nov 05 '25 18:11 elharo

Assuming by "xml PI" you mean the XML declaration, there are no such documents. The XML 1.0 grammar does not allow comments to appear before the XML declaration. Any document that has one there is not well-formed and should be immediately rejected by the parser.

This is exactly the sort of picky detail that "better" XML libraries usually get wrong, and why I'm nervous about adopting a new one. XML is designed to be simple to write, and perhaps simple to process compared to SGML, but it's not really simple to parse or model.

I meant the comments and spaces that are typically between the xml declaration and the first element.
Those comments are allowed, but not part of the infoset, and for this very reason, usually discarded by parsers.

gnodet avatar Nov 05 '25 22:11 gnodet

JDOM doesn't pay a lot of attention to the infoset as such. They were developed about the same time. It does, as do most parsers, handle comments that appear before the root element. I'm not sure where the idea that parsers don't handle these comments comes from. I don't think I've ever noticed any parser that doesn't process these in the expected way.

White space before and after the root element is a different story, but again that's trivial and not important in practice.

elharo avatar Nov 06 '25 14:11 elharo

so looking at the code in this repo, I have a sneaking suspicion that JDOM is configured and used incorrectly and that's causing 90% of the issues you're seeing. Improving the JDOM calls would be a lot easier and safer than building and depending on a completely new library.

elharo avatar Nov 06 '25 14:11 elharo

Also, I haven't been following JDOM for a while but look at https://github.com/hunterhacker/jdom/pull/156

It seems like it might even be ready to handle prolog whitespace.

elharo avatar Nov 06 '25 14:11 elharo

Also, I haven't been following JDOM for a while but look at hunterhacker/jdom#156

It seems like it might even be ready to handle prolog whitespace.

Here's a (non exhaustive) test about what JDOM will fail during round-tripping. https://github.com/maveniverse/domtrip/pull/90/files

At the end, JDOM relies on the underlying XML parser, usually SAX. None of the existing parser reports everything and JDOM is not able to handle whitespaces anyway.

So we can continue to argue, but it's really about whether we want a better round-tripping solution or not. Users have complained about some stuff since years. Now that we have a solution, I'd rather use it than just keep saying our users that we don't care. And AFAIK, DomTrip is the only alternative here.

gnodet avatar Nov 06 '25 16:11 gnodet