nokogiri icon indicating copy to clipboard operation
nokogiri copied to clipboard

remove_namespaces! does not strip namespaces from a duplicated XML document.

Open grosscol opened this issue 10 years ago • 4 comments

ENV: jruby 1.7.6 (1.9.3p392) 2013-10-22 6004147 on Java HotSpot(TM) Client VM 1.7.0_25-b17 [Windows 7-x86] nokogiri (1.6.0 java)

Reproduced by:

require 'nokogiri'

# Create an xml document
dummy_xml = '<?xml version="1.0"?><root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> </root>' 
a_doc = Nokogiri::XML::Document.parse ( dummy_xml )

# Duplicate the document
duplicate = a_doc.dup

# Show namespaces
puts a_doc.namespaces
puts duplicate.namespaces

# Remove namespaces
a_doc.remove_namespaces!
duplicate.remove_namespaces!

# Show namepaces
puts a_doc.namespaces
puts duplicate.namespaces

Output:

{"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"}
{"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"}
{}
{"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"}

grosscol avatar Nov 27 '13 18:11 grosscol

When I run the code, my output appears correct:

{"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"} {"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"} {} {}

lunchmeat avatar Feb 19 '14 02:02 lunchmeat

I see. One of the sys admins must have helpfully aliased ruby to jruby. The problem appears to be restricted to jruby. I guess this is not a ticket for nokogiri and is instead an issue for jruby. I'll check if this occurs in the most recent jruby build and then see about tracking down the issue to submit to them. Sorry for the inconvenience.

grosscol@tang:~/Public$ jruby noko.rb {"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"} {"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"} {} {"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"}

grosscol@tang:~/Public$ /usr/bin/ruby1.9.1 noko.rb {"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"} {"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"} {} {}

Thank you for taking the time to address my issue.

Sincerely, Colin

Colin A. Gross

On Tue, Feb 18, 2014 at 9:41 PM, lunchmeat [email protected] wrote:

When I run the code, my output appears correct:

{"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"} {"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"} {} {}

— Reply to this email directly or view it on GitHubhttps://github.com/sparklemotion/nokogiri/issues/1012#issuecomment-35460424 .

grosscol avatar Feb 19 '14 15:02 grosscol

Yes, this is definitely only in the jruby implementation. I've looked into this issue as well and have somewhat tracked down the problem (although not enough to fix it yet unfortunately). The issue might have something to do with shallow/deep copies in the java source code. The XML node itself gets deep-copied, because there's a prewritten function from another library that does that.

For some reason, though, when I looked at the object addresses, the data structure called the NokogiriNamespaceCache (which is used to store the data for the namespaces function) exists in two different locations. The first references the original and so it can be affected by cache-clearing in the original (you get some interesting effects in the code above if you switch the order around). The second is an independent copy that for some reason is used everywhere except in the remove_namespaces! function). The result is that if you print the XML itself everything seems to be working fine because the cache isn't being used, but if you try to get the namespaces through the function call, the cache storing that info has not been updated at all.

I will look into this more when I have the time---this might no longer be an issue for you but it should prob be fixed...

jyao6 avatar Feb 20 '14 07:02 jyao6

The behavior of this has changed since it was opened, the failing test now is related to Document#dup behavior:

#! /usr/bin/env ruby

require 'nokogiri'

require 'minitest/spec'
require 'minitest/autorun'

describe "Document#dup" do
  it "copies namespaces" do
    xml = '<?xml version="1.0"?><root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> </root>' 
    doc = Nokogiri::XML::Document.parse(xml)
    dup = doc.dup
    assert_equal(doc.namespaces, dup.namespaces)
  end
end

on JRuby we see:

# Running:

F

Finished in 0.222047s, 4.5036 runs/s, 4.5036 assertions/s.

  1) Failure:
Document#dup#test_0001_copies namespaces [./1012-jruby-namespaces.rb:13]:
--- expected
+++ actual
@@ -1 +1 @@
-{"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance"}
+{}


1 runs, 1 assertions, 1 failures, 0 errors, 0 skips

flavorjones avatar Dec 05 '21 19:12 flavorjones