maven
maven copied to clipboard
[MNG-5862] Support XML entities and XInclude
- [MNG-5862] XML entities support
- [MNG-5862] Support XInclude
- IT PR
This PR is built on top of #1245 which can be integrated separately. This feature could be opt-in or even extracted as a separate extension (hence the PR to give the needed support).
Note that the tests do ensure that any POM installed/deployed is a standalone one, i.e. the xinclude / entities are inlined during the consumer pom transformation step.
I think this PR only supports external entities, not XInclude?
Both, but the code for XInclude support is in third-party library for now: https://github.com/gnodet/stax-xinclude Also, the IT uses xinclude at https://github.com/apache/maven-integration-testing/pull/279/files#diff-f29a4cbe7d6f8b4fb5f7534e23bfe9622d437605cf1df65821d94389d2efe5e6R29
There might be some security implications here. There certainly will be people who claim there are.
Those new options are conditioned by reading a local file with strict mode (which means not when loading a POM from a dependency, i.e. only for projects being built). If needed, we could use a custom resolver and restrict to loading entities / xinclude from the file system, or even the project tree, but I'd like to avoid any restriction if they are not needed. As the files are loaded from locally built POM files, the user should be in control of those files.
This likely needs doc changes as well.
Yes, I'll try to find where...
There might be some security implications here. There certainly will be people who claim there are.
Those new options are conditioned by reading a local file with strict mode (which means not when loading a POM from a dependency, i.e. only for projects being built). If needed, we could use a custom resolver and restrict to loading entities / xinclude from the file system, or even the project tree, but I'd like to avoid any restriction if they are not needed. As the files are loaded from locally built POM files, the user should be in control of those files.
I've modified the xinclude support so that it reuses the XMLResolver. This allows using a single implementation to load external stuff. I've thus modified it to reject any non relative URI, which means the code can only access files under the user's control, so that should be fine from a security perspective.
Is this wise? XML parsers are littered with a long history of security vulnerabilities around loading crap remotely. Introducing that into a build system seems scary to me.
Is this wise? XML parsers are littered with a long history of security vulnerabilities around loading crap remotely. Introducing that into a build system seems scary to me.
I tried to mitigate the risks by only loading from rejecting any absolute URI. Also, no file with such entities / xinclude import should end up in maven central, those are translated when installed / uploaded. So this should only happen at build time, for the pom.xml that are parts of the build. The security problems are then kinda in the hand of the developper I would think.
That's said, I'd like to find a consensus, and if it's seen as too risky, we could go with only mixins. That's the whole point of the discussion I started on dev.
I tried to mitigate the risks by only loading from rejecting any absolute URI. Also, no file with such entities / xinclude import should end up in maven central, those are translated when installed / uploaded. So this should only happen at build time, for the pom.xml that are parts of the build. The security problems are then kinda in the hand of the developper I would think.
That's said, I'd like to find a consensus, and if it's seen as too risky, we could go with only mixins. That's the whole point of the discussion I started on dev.
Would you consider adding a CLI option that a user can specify to Maven to explicitly tell it not to support remote entities / xinclude stuff (and possibly even default the option to having remote entities / xinclude off)? That could help allay concerns, in that users would have to explicitly opt-in (or have an ability to explicitly opt-out) to enabling this new potentially risky behavior.
I tried to mitigate the risks by only loading from rejecting any absolute URI. Also, no file with such entities / xinclude import should end up in maven central, those are translated when installed / uploaded. So this should only happen at build time, for the pom.xml that are parts of the build. The security problems are then kinda in the hand of the developper I would think. That's said, I'd like to find a consensus, and if it's seen as too risky, we could go with only mixins. That's the whole point of the discussion I started on dev.
Would you consider adding a CLI option that a user can specify to Maven to explicitly tell it not to support remote entities / xinclude stuff (and possibly even default the option to having remote entities / xinclude off)? That could help allay concerns, in that users would have to explicitly opt-in (or have an ability to explicitly opt-out) to enabling this new potentially risky behavior.
I've added a way to opt-out using -Dmaven.experimental.xinclude=false
...
Would you consider adding a CLI option that a user can specify to Maven to explicitly tell it not to support remote entities / xinclude stuff (and possibly even default the option to having remote entities / xinclude off)? That could help allay concerns, in that users would have to explicitly opt-in (or have an ability to explicitly opt-out) to enabling this new potentially risky behavior.
I've added a way to opt-out using
-Dmaven.experimental.xinclude=false
...
Thank you!
@elharo can I go ahead with this PR ?
I'm still not convinced this is a good idea, but if it's going in, there are still some ciode issues to be addressed.
Well, I tried to kick a discussion on dev@ but not much interest there... The only concerns were the same that have been raised here around security, and while I agree it could have been a problem, I think those concerns have been addressed in the PR. So I assume lazy consensus...
The idea of xinclude/xml entities is slightly redundant with pom mixins, so I would have expected that this would have been raised... This discussion was started mid-august, definitely not the best time to have people involved.
FYI, just from a simple user's perspective, it seems peculiar that "experimental" (maven.experimental.xxx
) features are being enabled by default.
Typically I would expect a feature/option that a software developer considers "experimental" to be something that a user has to opt-in to enable, not opt-out to disable.
I'm not sure if that means these new features should be disabled by default, or if the options should be renamed to better indicate they aren't considered "experimental"?
FYI, just from a simple user's perspective, it seems peculiar that "experimental" (
maven.experimental.xxx
) features are being enabled by default.Typically I would expect a feature/option that a software developer considers "experimental" to be something that a user has to opt-in to enable, not opt-out to disable.
I'm not sure if that means these new features should be disabled by default, or if the options should be renamed to better indicate they aren't considered "experimental"?
I agree, I think those "experimental" flags will need to be removed or the features disabled by default before GA. But you're right, experimental + enabled by default looks borderline.
@gnodet Thanks for you work. Any plans on resolving the requested change?
One possibility would be to create an extension to support this feature. This should be possible as the entities/xinclude are only processed at build time and the consumer pom is flattened.
Mostly nits.
They should be all fixed now.
More generally, rereading this I'm struck at how much work this is. You've provided a nearly complete XInclude implementation. The maintenance burden on this code is likely to be high. It feels like this belongs in the parser, or perhaps a separate library on top of the parser, and all Maven should do when reading a pom is flip the flag to turn on XInclude support. Maven's XML parsing is already quite non-standard and avoids the normal parsers like Xerces or the JDK's because of decisions made 20 years ago when the easier path was not so obvious. This is unfortunately baked into a lot of the code and even the public APIs, so I'm not sure how much of that can be repaired now.
Nonetheless, I do wonder if this code belongs here instead of in the parser.
We switched to a standard StaX parser in 4.x. The xml parser is now Woodstox, which is a fully conformant parser. Unfortunately, I don't know any StaX parser that provides support for XInclude, so I ended up writing one, borrowing the XPointer implementation from Apache Woden. I'd rather keep it separated from the parser so that we can eventually switch to Aalto which is slightly faster than Woodstox, but there were a few problems when I tried initially.
As for the location, I originally put it in an external repository (https://github.com/gnodet/stax-xinclude), but you asked to move it back in this repository. I agree, this library is independent of Maven and would be better outside. I can maintain it as a personal project (and rename the package) if you think it's better, I don't really mind.
A personal project won't really work, but is there any way this can go into Woodstox or the Apache XML project or something like that?
A personal project won't really work, but is there any way this can go into Woodstox or the Apache XML project or something like that?
@cowtowncoder ? @scantor ?
A personal project won't really work, but is there any way this can go into Woodstox or the Apache XML project or something like that?
@cotowncoder ? @scantor ?
Umm. Maybe I was referenced by mistake, but I'm not sure of my relevance to this? (If this is the context of the Apache Xerces project, while I'm on the PMC, I'm a committer on the C++ side, I stopped using the Java version a long time ago, and use the JDK for that exclusively now.)
As a maintainer of software that uses Maven itself, I certainly have "opinions" but I think you all addressed the main one, which is to turn this off by default.
All I can say is, please don't ever enable this by default. Ever. I say that as somebody that's worked with XML for 25 years and has been fixing security bugs in the use of it for about 20. Maybe that's why you asked me?
The stigma surrounding xinclude is really quite infuriating. Is there a feature more maligned than this? The disparity between its notoriety and its simplicity is something almost poetic.
The fact is pom files already compose in at least 3 other ways - so if there's some "security principle" at play its already broken. It could be argued the whole purpose of pom files is to compose. Whether processing xinclude introduces risks has nothing to do with XML per se (a standard designed to be as open and dynamic as possible so no wonder there's so many horror stories) and everything to do with this implementation. You'd have to be half-mad not to use this one anyway.
I'm also not sure that xincludes will really work in practice, but I've yet to see why not.
The stigma surrounding xinclude is really quite infuriating. Is there a feature more maligned than this?
Yes, entities, deservedly so. That was actually more my concern. I have no background on what this issue is about or why I was asked about it, but apologies for not clarifying which of the two I was expressing the opinion about. I know very little about XInclude as it's very poorly supported in general, it came too late, so I've generally ignored it.
My opinion is that it's not sensible for Maven to even consider including a one-off implementation of any XML standard addendum. But that's a decision for its maintainers.
I remember thinking about XInclude at some point wrt Woodstox, but it seemed rather complicated to support for a streaming parser. So likely it'd instead need to be some processor on top of generic Stax (or SAX interface), in which case Woodstox as well as any other compliant implementation would work.
I don't think I personally have time to work on such thing, although I can see how it'd be useful.
This should become a separate Maven extension.