[MNG-8537] Maven 4 namespace should not change
Elliotte Rusty Harold opened MNG-8537 and commented
Just noticed that Maven 4 has a new namespace URL. The old namespace was
http://maven.apache.org/POM/4.0.0
The new one is
http://maven.apache.org/POM/4.1.0
Putting version numbers in namespaces is a known XML antipattern because it makes it extremely difficult and inconvenient to write tools that process both, even when they are much the same. XSLT, DOM, XQuery, JDOM, etc. — really any XML aware tool — is going to have problems with this.
Model version 4.1.0 is not a new and different schema that completely breaks with the past. Most old elements from 4.0.0 are still present and still mean exactly the same thing: groupId, artifactId, name, dependency, and most others. There are new elements but that doesn't imply a new namespace. Adding a new namespace is asserting that all the elements are different.
IMHO Maven 4 should not change the namespace URL.
Issue Links:
- MNG-8639 Check namespaces when reading XML models
Guillaume Nodet commented
The decision to put the version number in the namespace was made 20 years ago.
I think the real problem was that Maven never properly versioned the schema when changing it. I agree that if we do minor changes to the schema, we should not change the namespace, but version it properly. This has never been done and we definitely should.
We may be able to decouple the version of the schema from the model version. Maven itself doesn't care much about the namespace. Even when reading, there's no versioning / namespace checks at all.
Good first step was done a few months ago when we switched to a conformant XML parser !
Anyway, this is neither a bug, nor blocking.
Elliotte Rusty Harold commented
It's blocking because this change needs to be made before release, if ever. That is, keep the namespace URI http://maven.apache.org/POM/4.0.0 for the foreseeable future.
This might not be an issue for Maven's own code that ignores namespaces, but it's a big honking deal for anyone who wants to use XML tools to process pom files. Bad namespace hygiene in pom.xml files made for a lot of extra work when I was scanning Maven central for linkage checking, for instance. It effectively prevented the use of XPath and XSLT.
If Maven 4 still isn't checking for the right namespace when processing, then that's something else that needs to be fixed before release.
And now that I think about it, this might be a huge issue for XInclude if it's trying to include pom 4 content into pom 4.1. I need to take a look at that.
Guillaume Nodet commented
If Maven 4 still isn't checking for the right namespace when processing, then that's something else that needs to be fixed before release.
That may break a bunch of things. The namespace has never been mandatory afaik. Maven Central Repository contains POM files which have no namespaces: https://repo1.maven.org/maven2/aopalliance/aopalliance/1.0/aopalliance-1.0.pom We do need to support those, so the only thing we could do for now would be to log a warning, but given our policy to only log warnings in case the user can do something about it, I don't see any good solution.
And now that I think about it, this might be a huge issue for XInclude if it's trying to include pom 4 content into pom 4.1. I need to take a look at that.
The fact that the namespace is not enforced does not mean it cannot be used. If you try to include pom 4 content into pom 4.1 or even the opposite, it will just work, because namespaces aren't enforced for now.
Elliotte Rusty Harold commented
I'll deal with the XInclude issues over on https://github.com/apache/maven-xinclude-extension/issues/20 The problems I suspect are a little more subtle than that.
Even if we live with the existing broken namepace handling in model version 4.0, we should be able to require the namespace in modelVersion 4.1.0 and error if it's not there; and that's true no matter what the namespace URI for that modelVersion is.
Guillaume Nodet commented
Even if we live with the existing broken namepace handling in model version 4.0, we should be able to require the namespace in modelVersion 4.1.0 and error if it's not there; and that's true no matter what the namespace URI for that modelVersion is.
Yes, it should be fairly easy to add something along the following lines to the file validation: https://github.com/apache/maven/blob/98dedaf0e355063bd076cc746a219a3bc8f4ac69/impl/maven-impl/src/main/java/org/apache/maven/internal/impl/model/DefaultModelValidator.java#L306
if (!Objects.equals(m.getModelVersion(), ModelBuilder.MODEL_VERSION_4_0_0)) {
validateStringNotEmpty("project namespace", problems, Severity.FATAL, Version.BASE, m.getNamespaceUri(), m);
}
Guillaume Nodet commented
I've create MNG-8639 to validate namespace consistency and enforce the use of a namespace when reading XML models with modelVersion > 4.0.0.
Guillaume Nodet commented
I don't think this is really achievable. We will probably release a subsequent schema that will remove the deprecated elements. In addition having a fixed namespace would require the parser to eventually be able to know which version of the model is loaded. Unfortunately, the modelVersion is an element and not an attribute, so it may be put at the very end of the xml. Plus the fact that the poms uploaded with the 4.0.0 namespace may not even be XML conformant...
Elliotte Rusty Harold commented
That isn't how namespaces are supposed to work. Namespaces are not versions. Adding and removing elements need not and should not change the namespace URI. Ditto any other schema evolution.
Elliotte Rusty Harold commented
And if the files aren't XML, then namespaces aren't really being used at all.
Guillaume Nodet commented
I think the major point here is the fact that the modelVersion may be set at the end. So the parser would only know which model version is used at the end ? That's just not gonna work. Also I doubt Modello has been architected with the ability to support that (though I may be wrong).
If modelVersion would be moved as an attribute, it may be possible to get that working, but that will require a brand new namespace anyway. That one could be made to not change in the future.
But please, be my guest, go ahead and fix the problem. You raised the issue 3 months ago, the new namespace has been introduced 18 months ago, and the decision to version namespaces has been made 20 years ago, without actually using the namespaces, having non conformant XML, etc...
I'm not sure I'm willing to spend a week of my time for that.
Guillaume Nodet commented
And if the files aren't XML, then namespaces aren't really being used at all.
Yes, agreed. And that makes me think that there's no real XML tool processing maven POMs, but the maven parser so far...
So cleaner namespace support would be interesting, it only becomes useful if we have a more extensible schema with the ability for extensions to plug in.
Elliotte Rusty Harold commented
Chicken and egg. I've personally tried to write tools to process poms based on regular XML tools like XSLT and failed precisely because the namespaces were broken.
Guillaume Nodet commented
Yes, well, a first step will be available once 4.0 is released: we'll have a new namespace that all 4.1.0 model will use and they will be XML conformant. I'd like to be able to validate any POM before it's synced to Central, but we need to ask Sonatype if that's doable. The problem is that Maven produced POMs will most probably be fine, but we have no control over the ones generated by gradle or any other tool. So xml conformance and the namespace need to be enforced in Central, not only in Maven itself.
Elliotte Rusty Harold commented
Is that a new problem? Today anybody can put anything, valid or well-formed or not, in a file and call it a pom.xml. I'm not sure what they could slip through Maven Central but they can certainly put it in a local repo. Maven defines the pom format, and it's OK to put new requirements in new model versions and for maven to reject poms with that model version that don't follow the documented rules.
It would be nice if we had cleaner separation between build tools and repository systems, but if wishes were horses and all that.
Elliotte Rusty Harold commented
Thinking about this further, it occurs to me that the use of XML tools is a strong reason why XML namespaces should not change. Changing the namespace breaks any XML-based tools that have already been written to support the old namespace.
Example: suppose I used non-Maven based XML tools to read dependency graphs out of poms using XPath and/or XSLT. (Hypothetical only because I tried this and it didn't work precisely because of the pre-existing namespace issues.) Now the namespace changes in new poms. Now my code no longer works on any new pom, and likely returns incorrect results. The code would have worked had new elements been added and others removed that did not directly affect dependencies. That is, changing the namespace is a global breakage for existing code.
http://maven.apache.org/POM/4.0.0 is the namespace. A quick spot check shows that's been the case since at least 2008, maybe longer. It should remain so for the foreseeable future. We can't fix what's already been published to Maven Central, but we can stop the bleeding.
Guillaume Nodet commented
In order to stop the bleeding, we need to:
- make sure Maven 4 enforces the use of a namespace (that has been implemented with MNG-8639)
- enforce that POMs uploaded to Maven Central are valid. This is actually more than just the fact that they adhere to the schema, we should do proper validation with our own ModelValidator. And ask Sonatype to include our checks while vetting uploaded bundles.
I fully support those, but that's somewhat unrelated to any namespace change or not change.
Digging into the code, I now notice that we're keying off of the namespace to select the model version, which means in the future we won't be able to change one without changing the other. We really need to decouple these before shipping 4.0.