nokogiri-diff icon indicating copy to clipboard operation
nokogiri-diff copied to clipboard

Changes to improve usability for testing?

Open leehambley opened this issue 11 years ago • 4 comments

Hey, following positive and communicative feedback on the other issue I wanted to pitch an idea.

I've come to nokogiri-diff for testing pages (we have a ICAP proxy that we are trying to test). We need to verify that certain pages have been correctly modified, here's our workflow:

curl -is http://wwwexample.com > www_example_com.orig
cp www_example_com.{orig,mod}

Modify the modified version to meet our spec requirements, and then test it something like this:

it "should catch them all" do
  get "/proxy/http://www.example.com"
  html_diff(last_response.body, mod_body).should be_empty
end

The mocks are done with Webmock, and load as one spec with any spec fixture.

The HTML diff method I wrote had to screen out some whitespace differences (no semantic value):

  def html_diff(one, two)
    Nokogiri::HTML(one).diff(
      Nokogiri::HTML(two)
    ).select do |change, node|
      !change.strip.empty?
    end.collect do |change, node|
      puts "#{change} at #{node.to_s} #{node.path}"
    end.to_a
  end

I'm not too happy with the implementation, but it more or less works:- unfortunately this is client work into which I can't afford to invest a lot of time into testing infrastructure (“Just ship it!” :shipit:) - but I'd like to get your feedback on whether you can see the gem adding those features?

I noticed as well, probably for the best that text nodes don't register as changes, just the attributes and hierarchy.

leehambley avatar Apr 20 '13 09:04 leehambley

Two options come to mind:

  1. add a :whitespace option (similar to :added) that would ignore whitespace text changes.
  2. Convert your testing code into an RSpec matcher/formatter.

postmodern avatar Apr 21 '13 06:04 postmodern

@postmodern I wasn't absolutely clear where the whitespace changes were coming from, does the Nokogiri DOM tree include whitespace nodes by default? (I expected they would be stripped, as in both HMTL and XML they are irrelevant)

Regarding option #2, that's absolutely a possibility, I just don't know a lot about rspec, specifically how to present sane error messages when the match fails.

leehambley avatar Apr 22 '13 12:04 leehambley

Nokogiri will parse whitespace as Nokogiri::XML::Text nodes. It might be possible to filter out blank Text nodes.

postmodern avatar Apr 22 '13 18:04 postmodern

The more elegant way would be to tell nokogiri to remove blank nodes using the parse options when you originally parse the xml.

doc = Nokogiri::XML(File.open("mydoc.xml")) do |config|
  config.strict.noblanks
end

annaswims avatar Oct 31 '13 20:10 annaswims