nokogiri-diff
nokogiri-diff copied to clipboard
Changes to improve usability for testing?
Hey, following positive and communicative feedback on the other issue I wanted to pitch an idea.
I've come to nokogiri-diff for testing pages (we have a ICAP proxy that we are trying to test). We need to verify that certain pages have been correctly modified, here's our workflow:
curl -is http://wwwexample.com > www_example_com.orig
cp www_example_com.{orig,mod}
Modify the modified version to meet our spec requirements, and then test it something like this:
it "should catch them all" do
get "/proxy/http://www.example.com"
html_diff(last_response.body, mod_body).should be_empty
end
The mocks are done with Webmock, and load as one spec with any spec fixture.
The HTML diff method I wrote had to screen out some whitespace differences (no semantic value):
def html_diff(one, two)
Nokogiri::HTML(one).diff(
Nokogiri::HTML(two)
).select do |change, node|
!change.strip.empty?
end.collect do |change, node|
puts "#{change} at #{node.to_s} #{node.path}"
end.to_a
end
I'm not too happy with the implementation, but it more or less works:- unfortunately this is client work into which I can't afford to invest a lot of time into testing infrastructure (“Just ship it!” :shipit:) - but I'd like to get your feedback on whether you can see the gem adding those features?
I noticed as well, probably for the best that text nodes don't register as changes, just the attributes and hierarchy.
Two options come to mind:
- add a
:whitespace
option (similar to:added
) that would ignore whitespace text changes. - Convert your testing code into an RSpec matcher/formatter.
@postmodern I wasn't absolutely clear where the whitespace changes were coming from, does the Nokogiri DOM tree include whitespace nodes by default? (I expected they would be stripped, as in both HMTL and XML they are irrelevant)
Regarding option #2, that's absolutely a possibility, I just don't know a lot about rspec, specifically how to present sane error messages when the match fails.
Nokogiri will parse whitespace as Nokogiri::XML::Text nodes. It might be possible to filter out blank Text nodes.
The more elegant way would be to tell nokogiri to remove blank nodes using the parse options when you originally parse the xml.
doc = Nokogiri::XML(File.open("mydoc.xml")) do |config|
config.strict.noblanks
end