red-datasets
red-datasets copied to clipboard
wikipedia: increase REXML entity expansion limit during XML parsing
Using Datasets::Wikipedia#each
raised an entity expansion has grown too large (RuntimeError)
. This error occurs because the entity expansion limit in REXML is set by https://github.com/ruby/rexml/pull/187, and Datasets::Wikipedia#each
exceeds that limit.
In Red Datasets, increasing the entity expansion limit is not a problem because we want to handle large datasets. Therefore, we temporarily increase the limit.
require 'datasets'
wikipedia = Datasets::Wikipedia.new
wikipedia.each do |wiki|
pp wiki
end
$ cd red-datasets && bundle && bundle exec ruby wiki
/home/otegami/.rbenv/versions/3.3.3/lib/ruby/gems/3.3.0/gems/rexml-3.3.4/lib/rexml/parsers/baseparser.rb:560:in `block in unnormalize': entity expansion has grown too large (RuntimeError)