nokogiri.org
nokogiri.org copied to clipboard
Document for JRuby users the best practices for thread safety (doc.xpath is not thread-safe on JRuby)
somehow, I got the impression that xpath
is thread-safe (even under JRuby)
... or at least fixable (with the xpath-support cache) when its just an expr (no binds/fn-handler passed)
but was proven wrong as I added a test :
def test_concurrent_xpath
# skip("no need to run under MRI") unless Nokogiri.jruby?
doc = Nokogiri::XML(File.open(XPATH_FILE))
threads = []; errors = []
4.times do |i|
threads << Thread.start do
begin
sleep 0.01 * i
assert_equal 11, doc.xpath('.//category').size
assert_equal 1165, doc.xpath('//module').size
assert_equal 10353, doc.xpath('//module/param').size
assert_equal 14, doc.xpath('//param[@name="href"]').size
rescue Exception => ex
errors << ex
end
end
end
threads.each(&:join)
raise errors.first if errors.any?
end
it either reports incorrect results or in a better case fails with an error such as :
Nokogiri::XML::TestXPath#test_concurrent_xpath:
Java::JavaLang::ArrayIndexOutOfBoundsException: -1
org.apache.xml.utils.IntStack.push(IntStack.java:86)
org.apache.xpath.XPathContext.pushCurrentNode(XPathContext.java:793)
org.apache.xpath.axes.PredicatedNodeTest.acceptNode(PredicatedNodeTest.java:468)
org.apache.xpath.axes.DescendantIterator.nextNode(DescendantIterator.java:222)
org.apache.xpath.axes.NodeSequence.nextNode(NodeSequence.java:335)
org.apache.xpath.axes.NodeSequence.runTo(NodeSequence.java:494)
org.apache.xml.dtm.ref.DTMNodeList.<init>(DTMNodeList.java:81)
org.apache.xpath.objects.XNodeSet.nodelist(XNodeSet.java:346)
nokogiri.XmlXpathContext.tryGetNodeSet(XmlXpathContext.java:190)
nokogiri.XmlXpathContext.node_set(XmlXpathContext.java:163)
nokogiri.XmlXpathContext.evaluate(XmlXpathContext.java:130)
this is caused by re-using the same cached xpath object between threads. on the other hand, maybe there never was an official guarantee regarding thread-safety.
@kares has submitted a fix for this in sparklemotion/nokogiri#1597
Ah, my mistake, sparklemotion/nokogiri#1597 does not contain a fix for this. Ignore my previous comment.
I've pushed a branch, 1596_kares_xpath-not-threadsafe
, with this failing test in it.
@flavorjones thanks, was more interested in your input on this (if you have any or know anyone who does).
somehow I assumed its meant to be (that doc.xpath
is "officially") thread-safe, but of course with the caching done as is I was proven wrong (doing sparklemotion/nokogiri#1597 I assumed that its only the binding cases that are problematic but its the whole underlying cached internal XPath structure). was thinking about introducing a system property to switch to a thread-local cache as it would make sense in some scenarios (but probably not to have it on by default) ?
basically current status is: doc.xpath
not thread-safe due the caching, without the caching its (terribly slow but) thread-safe. so the switch might have several states:
-
-Dnokogiri.xpath.cache=true
(default) -
-Dnokogiri.xpath.cache=thread
-
-Dnokogiri.xpath.cache=false
@jvshahid Any thoughts on this?
I've rebased the branch 1596_kares_xpath-not-threadsafe
just to try to keep this issue up to date. Still not sure what the right thing to do is.
was thinking about introducing a system property to switch to a thread-local cache as it would make sense in some scenarios (but probably not to have it on by default) ?
Why not ? I don't the idea of introducing more flags that the user has to discover and configure properly. I think Nokogiri should DTRT and have a thread local cache.
I researched this a little bit more. The underlying libraries aren't thread safe. Java DOM's specification doesn't require implementation and Xerces
certainly is indeed not thread safe according to their FAQ 1. I think the right thing to do is to document this fact somewhere on the docs site, i.e. best practices for concurrent usage of Nokogiri.
Updated title to indicate the plan is to document thread-safety limitations, and am deleting the branch with the failing test.
I don't the idea of introducing more flags that the user has to discover and configure properly. I think Nokogiri should DTRT and have a thread local cache.
:+1: should be the default despite it having a negative performance impact, some 'power' users might want to change to a non-thread-safe cache option or maybe not ...