nokogiri
nokogiri copied to clipboard
Explore replacements for Xalan-J
See #1760 for more context.
The JRuby implementation of Nokogiri uses Xalan-J for XSLT support, but this thread reveals that development (and likely support) has stalled on that project.
A potential replacement, noted in the thread linked above, is Saxon. There may be other potential replacements as well.
The outcome intended for this issue is to do some exploration of replacing Xalan-J:
- is there a replacement that will functionally support what Xalan-J supports today?
- how difficult will a transition be?
Tagging @jvshahid and @kares for visibility
I don't believe there's been much forward momentum on this, but another motivator to make a move came up recently in the form of a CVE for Xalan:
https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-qwq9-89rg-ww72
The comment here asks whether we might be able to transition (at least) to the shipping Xalan in OpenJDK, to which the answer is "probably": https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-qwq9-89rg-ww72#advisory-comment-74545
In any case this should probably be addressed sooner than later since Xalan-J has been EOL for many years now, and is only being maintained in JDK as part of JDK. Saxon may be the best path forward, but using the JDK Xalan may be a short-term way to avoid the super-stale external version.
@headius Thanks for commenting. I may need help with the mechanics of switching to the OpenJDK version of Xalan. I've got a branch where I've updated the package paths, but there are a few places where the API has changed that I need to work through. I'll push it shortly.
I've created a draft PR migrating to the JDK Xalan that needs help to get unstuck: https://github.com/sparklemotion/nokogiri/pull/2632
Saxon may be the best path forward, but using the JDK Xalan may be a short-term way to avoid the super-stale external version.
@headius so the only downside is we will need to add-opens the internal com.sun.org.apache
package?
not sure if this is smt for JDK 17+, whether Nokogiri would be able to rely on those in the long-term ...
#2632 is a PR that @kares and I collaborated on to try to move Nokogiri away from upstream Xalan to using the JDK internal packages.
Unfortunately, relying on JDK internals doesn't seem to work well (at least not without a bit more work and potentially breaking changes). Java 17 disallows usage of internals with this error:
Java::JavaLang::IllegalAccessError: superclass access check failed: class nokogiri.internals.XalanDTMManagerPatch (in unnamed module @0x41d20f06) cannot access class com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault (in module java.xml) because module java.xml does not export com.sun.org.apache.xml.internal.dtm.ref to unnamed module @0x41d20f06
In @kares's words:
all in all the change to the internal xalan would need more work but at the same time I am not sure it's worth the effort, given the reliance on internals ... [and the] need to adjust JRUBY_OPTS flags to get it working
The only other option that's been suggested is porting to Saxon. I'm not comfortable enough with these libraries to estimate how much effort that would be. Does anybody here want to hazard a guesstimate?
Are there any other options (to Xalan-J) we haven't considered?
:+1: and just to clarify: Java::JavaLang::IllegalAccessError
means users that would use Nokogiri with JDK Xalan would likely have to edit their JRUBY/JAVA_OPTS to --add-ones
for the internals to be accessible (assuming we can still access them despite not being exported).
The only other option that's been suggested is porting to Saxon. I'm not comfortable enough with these libraries to estimate how much effort that would be. Does anybody here want to hazard a guesstimate?
I guess ~ 80% should be easy but than there's a lot of internals used which might be impossible -> rewriting those parts with Saxon specific APIs. My raw guess would be 10 work days.
Are these illegal accesses from using reflection to dig inside Xalan? If so we might be able to reduce these by being less invasive, but perhaps that would hinder compatibility.
Moving to a supported external library like Saxon would definitely be the best option long term, but I agree with @kares it would involve at least a week or two of work.
Are these illegal accesses from using reflection to dig inside Xalan?
From what we've bumped into so far it was not reflection - simply using Xalan's classes (from the com.sun.org.apache.xml.internal
package) - on 11 it warns on 17 errors. As noted fiddling with --add-opens
might do some damage control but it feels like a breaking change for JRuby users ...
I think the flag we'd want is --add-exports
so that the java.xml
module exports the internal com.sun.org.apache.xml.internal
package to external consumers, but you're right... it's still a breaking change.
I am not familiar with the Xalan-related code in Nokogiri... is there a reason we can't just use the published XSLT APIs? Can one of you point out some example code that requires direct access to Xalan?
Poking around outside of Xalan, it really seems like Saxon is the only game in town, and for sure if you want XSL 2.0 or 3.0 support. Given the lifecycle of Xalan and the inaccessibility of the JDK version, I think we will need to bite the bullet and move to Saxon sooner or later.
FWIW, seems our ASF friends did cut a Xalan-J 2.7.3
release after 9 long years last week, PR at https://github.com/sparklemotion/nokogiri/pull/2873