nokogiri
nokogiri copied to clipboard
Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby.
We're updating our app from CRuby to JRuby, so we're swapping to the Java version of the nokogiri gem. I'm seeing some test failures, and the root cause seems to...
See #1063 for context. MRI has an optional second argument for `Node#dup` that the JRuby implementation does not have.
See #1760 for more context. The JRuby implementation of Nokogiri uses [Xalan-J](https://xml.apache.org/xalan-j/) for XSLT support, but [this thread](https://mail-archives.apache.org/mod_mbox/xalan-dev/201810.mbox/browser) reveals that development (and likely support) has stalled on that project. A...
From an email from Devlin Daley to nokogiri-talk: > If I had any C extension fu I would add what I think is an > awesome approach to Nokogiri for...
Using `~/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/nokogiri-1.8.1` In general, the ruby enumerable methods, like `Array.select`, return a new Array, leaving the original untouched. For example, ```ruby x = [*1..10] #=> [1, 2, 3, 4, 5,...
Example: ``` ruby require 'nokogiri' reader = Nokogiri::XML::Reader(File.open 'entities.xml') reader.each do |node| puts node.outer_xml if node.node_type == 1 && node.name == 'Main' end ``` using this XML: ``` xml This...
Not all the namespaces referenced inside a XML node are carried over by `Node#add_child`. Reproducible testcase: https://gist.github.com/gioele/2c88ac73f4f28f79fbc6#file-add_child_ns-rb Output: ``` == Original document aaa bbb == New document == The attribute...
The LostText class from cyberneko throws a ConcurrentModificationException when parsing some HTML. See below output for environment information. Removing any part of the HTML in the sample test case succeeds...
Apparently nokogiri does not or not always supports encodings as defined via meta[http-equiv=content-type] tags. I build a workaround https://gist.github.com/radiospiel/5148046, where `Nokogiri::HTML.with_meta_encoding(data)` reparses the document if there is a meta encoding...
this would allow validating the xml schema as the stream is parsed with SAX parsing method for example. Currently one has to validate/parse the whole xml schema upfront and then...