nokogiri icon indicating copy to clipboard operation
nokogiri copied to clipboard

Explore replacements for Xalan-J

Open flavorjones opened this issue 5 years ago • 11 comments

See #1760 for more context.

The JRuby implementation of Nokogiri uses Xalan-J for XSLT support, but this thread reveals that development (and likely support) has stalled on that project.

A potential replacement, noted in the thread linked above, is Saxon. There may be other potential replacements as well.

The outcome intended for this issue is to do some exploration of replacing Xalan-J:

  • is there a replacement that will functionally support what Xalan-J supports today?
  • how difficult will a transition be?

Tagging @jvshahid and @kares for visibility

flavorjones avatar Dec 03 '18 22:12 flavorjones

I don't believe there's been much forward momentum on this, but another motivator to make a move came up recently in the form of a CVE for Xalan:

https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-qwq9-89rg-ww72

The comment here asks whether we might be able to transition (at least) to the shipping Xalan in OpenJDK, to which the answer is "probably": https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-qwq9-89rg-ww72#advisory-comment-74545

In any case this should probably be addressed sooner than later since Xalan-J has been EOL for many years now, and is only being maintained in JDK as part of JDK. Saxon may be the best path forward, but using the JDK Xalan may be a short-term way to avoid the super-stale external version.

headius avatar Aug 22 '22 20:08 headius

@headius Thanks for commenting. I may need help with the mechanics of switching to the OpenJDK version of Xalan. I've got a branch where I've updated the package paths, but there are a few places where the API has changed that I need to work through. I'll push it shortly.

flavorjones avatar Aug 22 '22 22:08 flavorjones

I've created a draft PR migrating to the JDK Xalan that needs help to get unstuck: https://github.com/sparklemotion/nokogiri/pull/2632

flavorjones avatar Aug 23 '22 14:08 flavorjones

Saxon may be the best path forward, but using the JDK Xalan may be a short-term way to avoid the super-stale external version.

@headius so the only downside is we will need to add-opens the internal com.sun.org.apache package? not sure if this is smt for JDK 17+, whether Nokogiri would be able to rely on those in the long-term ...

kares avatar Aug 24 '22 07:08 kares

#2632 is a PR that @kares and I collaborated on to try to move Nokogiri away from upstream Xalan to using the JDK internal packages.

Unfortunately, relying on JDK internals doesn't seem to work well (at least not without a bit more work and potentially breaking changes). Java 17 disallows usage of internals with this error:

Java::JavaLang::IllegalAccessError: superclass access check failed: class nokogiri.internals.XalanDTMManagerPatch (in unnamed module @0x41d20f06) cannot access class com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault (in module java.xml) because module java.xml does not export com.sun.org.apache.xml.internal.dtm.ref to unnamed module @0x41d20f06

In @kares's words:

all in all the change to the internal xalan would need more work but at the same time I am not sure it's worth the effort, given the reliance on internals ... [and the] need to adjust JRUBY_OPTS flags to get it working

The only other option that's been suggested is porting to Saxon. I'm not comfortable enough with these libraries to estimate how much effort that would be. Does anybody here want to hazard a guesstimate?

Are there any other options (to Xalan-J) we haven't considered?

flavorjones avatar Sep 09 '22 03:09 flavorjones

:+1: and just to clarify: Java::JavaLang::IllegalAccessError means users that would use Nokogiri with JDK Xalan would likely have to edit their JRUBY/JAVA_OPTS to --add-ones for the internals to be accessible (assuming we can still access them despite not being exported).

The only other option that's been suggested is porting to Saxon. I'm not comfortable enough with these libraries to estimate how much effort that would be. Does anybody here want to hazard a guesstimate?

I guess ~ 80% should be easy but than there's a lot of internals used which might be impossible -> rewriting those parts with Saxon specific APIs. My raw guess would be 10 work days.

kares avatar Sep 11 '22 15:09 kares

Are these illegal accesses from using reflection to dig inside Xalan? If so we might be able to reduce these by being less invasive, but perhaps that would hinder compatibility.

Moving to a supported external library like Saxon would definitely be the best option long term, but I agree with @kares it would involve at least a week or two of work.

headius avatar Sep 13 '22 19:09 headius

Are these illegal accesses from using reflection to dig inside Xalan?

From what we've bumped into so far it was not reflection - simply using Xalan's classes (from the com.sun.org.apache.xml.internal package) - on 11 it warns on 17 errors. As noted fiddling with --add-opens might do some damage control but it feels like a breaking change for JRuby users ...

kares avatar Sep 14 '22 08:09 kares

I think the flag we'd want is --add-exports so that the java.xml module exports the internal com.sun.org.apache.xml.internal package to external consumers, but you're right... it's still a breaking change.

I am not familiar with the Xalan-related code in Nokogiri... is there a reason we can't just use the published XSLT APIs? Can one of you point out some example code that requires direct access to Xalan?

headius avatar Sep 14 '22 14:09 headius

Poking around outside of Xalan, it really seems like Saxon is the only game in town, and for sure if you want XSL 2.0 or 3.0 support. Given the lifecycle of Xalan and the inaccessibility of the JDK version, I think we will need to bite the bullet and move to Saxon sooner or later.

headius avatar Sep 14 '22 14:09 headius

FWIW, seems our ASF friends did cut a Xalan-J 2.7.3 release after 9 long years last week, PR at https://github.com/sparklemotion/nokogiri/pull/2873

chadlwilson avatar May 07 '23 15:05 chadlwilson