nokogiri
nokogiri copied to clipboard
remove_namespaces! does not strip namespaces from a duplicated XML document.
ENV: jruby 1.7.6 (1.9.3p392) 2013-10-22 6004147 on Java HotSpot(TM) Client VM 1.7.0_25-b17 [Windows 7-x86] nokogiri (1.6.0 java)
Reproduced by:
require 'nokogiri'
# Create an xml document
dummy_xml = '<?xml version="1.0"?><root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> </root>'
a_doc = Nokogiri::XML::Document.parse ( dummy_xml )
# Duplicate the document
duplicate = a_doc.dup
# Show namespaces
puts a_doc.namespaces
puts duplicate.namespaces
# Remove namespaces
a_doc.remove_namespaces!
duplicate.remove_namespaces!
# Show namepaces
puts a_doc.namespaces
puts duplicate.namespaces
Output:
{"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"}
{"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"}
{}
{"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"}
When I run the code, my output appears correct:
{"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"} {"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"} {} {}
I see. One of the sys admins must have helpfully aliased ruby to jruby. The problem appears to be restricted to jruby. I guess this is not a ticket for nokogiri and is instead an issue for jruby. I'll check if this occurs in the most recent jruby build and then see about tracking down the issue to submit to them. Sorry for the inconvenience.
grosscol@tang:~/Public$ jruby noko.rb {"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"} {"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"} {} {"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"}
grosscol@tang:~/Public$ /usr/bin/ruby1.9.1 noko.rb {"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"} {"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"} {} {}
Thank you for taking the time to address my issue.
Sincerely, Colin
Colin A. Gross
On Tue, Feb 18, 2014 at 9:41 PM, lunchmeat [email protected] wrote:
When I run the code, my output appears correct:
{"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"} {"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"} {} {}
— Reply to this email directly or view it on GitHubhttps://github.com/sparklemotion/nokogiri/issues/1012#issuecomment-35460424 .
Yes, this is definitely only in the jruby implementation. I've looked into this issue as well and have somewhat tracked down the problem (although not enough to fix it yet unfortunately). The issue might have something to do with shallow/deep copies in the java source code. The XML node itself gets deep-copied, because there's a prewritten function from another library that does that.
For some reason, though, when I looked at the object addresses, the data structure called the NokogiriNamespaceCache
(which is used to store the data for the namespaces
function) exists in two different locations. The first references the original and so it can be affected by cache-clearing in the original (you get some interesting effects in the code above if you switch the order around). The second is an independent copy that for some reason is used everywhere except in the remove_namespaces!
function). The result is that if you print the XML itself everything seems to be working fine because the cache isn't being used, but if you try to get the namespaces
through the function call, the cache storing that info has not been updated at all.
I will look into this more when I have the time---this might no longer be an issue for you but it should prob be fixed...
The behavior of this has changed since it was opened, the failing test now is related to Document#dup
behavior:
#! /usr/bin/env ruby
require 'nokogiri'
require 'minitest/spec'
require 'minitest/autorun'
describe "Document#dup" do
it "copies namespaces" do
xml = '<?xml version="1.0"?><root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> </root>'
doc = Nokogiri::XML::Document.parse(xml)
dup = doc.dup
assert_equal(doc.namespaces, dup.namespaces)
end
end
on JRuby we see:
# Running:
F
Finished in 0.222047s, 4.5036 runs/s, 4.5036 assertions/s.
1) Failure:
Document#dup#test_0001_copies namespaces [./1012-jruby-namespaces.rb:13]:
--- expected
+++ actual
@@ -1 +1 @@
-{"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"}
+{}
1 runs, 1 assertions, 1 failures, 0 errors, 0 skips