jackrabbit-oak
jackrabbit-oak copied to clipboard
OAK-10544: oak-jcr: remapping a namespace prefix leaves namespace resolver in broken state
This is a minimal fix that actually works in that it fixes the bug, and introduces no regressions in existing tests. Whether it's the right place I'm not sure yet.
SonarCloud Quality Gate failed. 
0 Bugs
0 Vulnerabilities
0 Security Hotspots
1 Code Smell
0.0% Coverage
0.0% Duplication
The version of Java (11.0.21) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here
Catch issues before they fail your Quality Gate with our IDE extension
SonarLint
@mreutegg , @anchela - this needs reviews.
Reviewers: please see https://issues.apache.org/jira/browse/OAK-10544?focusedCommentId=17807761&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17807761
It might not introduce regressions in existing test, but I think the change is very problematic.
I see that the JCR spec says this should be the "correct" behavior, but I think we shouldn't follow the spec here in Oak. Instead, in my view the caller should not use this weird edge case. Instead, he should explicitly set the local mapping.
The reason is, you have changed "getJcrName" (a getter method that looks completely innocent) to modify state of the session. The "getJcrName" is used all over the place. This can lead to concurrency problems if the map is not a concurrent map, and other problems. For example, QueryEngine uses NO_BINDINGS = emptyMap() -- which is immutable. So if you execute a query with a non-mapped URI, then the query would fail with a strange exception (right now it doesn't).
I understand the concern (and yes, we need tests in oak-core as well).
That said, I don't think that not fixing this is a real option. This works in Jackrabbit classic and causes real problems in practice. The "caller" is just trying to import a content packageand has zero information about what causes the NS prefix clash.
FWIW, "getJcrName()" can not possibly work in this case unless it can add a mapping. This needs to be either permantent (repository registry) or temporarily (session mappings). Of these choices, the second looks less intrusive to me.
Note that this restores a behavior we had in Oak as well; see https://github.com/apache/jackrabbit-oak/commit/250bcb5ba4e0e8841d7c4d0378ba7ff85f10ded3 (10 years ago).
So if you execute a query with a non-mapped URI, then the query would fail with a strange exception (right now it doesn't).
Well. The choices are:
- Produce a broken name that will not work (current behavior).
- Produce a broken name that will not work and potentially log that a s problem (WARN?)
- Produce a name that does work by registering the prefix in the session.
My preference is a proper fix; I understand your concerns and will have a closer look, also adding tests closer to the code that we want do modify.
https://github.com/apache/jackrabbit-oak/pull/1199/commits/bf485dfdf224554b51d8cfe50e067238b9e70fde addresses the "immutable" use case (doc, catch, test coverage).
Regarding the concurrency concern: sessions are not thread-safe, so this should not be critical.
The current PR addresses the user-visible problem with minimal surgery. We may want to discuss making this more pretty (for instance, by providing the prefix setter call back as additional argument), but my preference would be to do that in a separate issue.
Basically, the risk is that if we change this area, then existing applications (that rely on the current behavior for some reason) might be broken.
Currently we loop until we find a "free" prefix (ax2, ax3, ax4,...).
An alternative solution might be: instead of looping, we could calculate the hash code of the URI (e.g. ax_489acc8493f073ab). That would make prefixes very large, but we would always get the same prefix.
Basically, the risk is that if we change this area, then existing applications (that rely on the current behavior for some reason) might be broken.
They would rely on the result of "getJcrName()" to be broken :-). But yes, I see your point. The current PR is minimal in that the API signature is unchanged, but modified in the sense of "if you supply a modifiable map, we may actually modify it - your choice". But of course we can instead extend the API to ensure that only the one place we want to affect right now will be affected.
An alternative solution might be: instead of looping, we could calculate the hash code of the URI (e.g. ax_489acc8493f073ab). That would make prefixes very large, but we would always get the same prefix.
More predictable prefixes would be good. Consistency could also be achieved by exposing the prefix minting code in oak-core, so that oak-jcr gets the same prefix. At the end of the day, we always will have coupling; the difference is only whether it's explicit or implicit.
FWIW: minting prefixes could also be made smarter by special-casing widely-used namespace names, such as in use in XMP/EXIF etc.
@reschke May I also suggest a feature flag to enable the behavior? Maybe it only ever gets enabled in one environment, but that may be enough to reduce the pressure to broaden the scope of the change.
@mreutegg - that's an interesting suggestion; I'll try this with the test in oak-jcr and the proposed tests in vlt
UPDATE: indeed it does; and this fix doesn't cross core/jcr boundaries; will update the PR.