anemone
anemone copied to clipboard
Anemone web-spider framework
Hi, Thanks for the useful gem! This PR suppresses the below errors. ``` /BUNDLE_ROOT/ruby/2.7.0/gems/anemone-0.7.2/lib/anemone/page.rb:157: warning: URI.unescape is obsolete /BUNDLE_ROOT/ruby/2.7.0/gems/anemone-0.7.2/lib/anemone/page.rb:157: warning: URI.escape is obsolete ```
With this new release we would have the ability to use Proxy with authentication
Support Phantomjs
Hi, I needed to add some additional HTTP request headers and didn't see any support for that currently. Happy to change anything / add more specs to cover the change....
I'm using Anemone. It's excelletnt!! But Garbled characters occur, if the page's charset is not UTF-8 or US-ASCII. So I want to support other charsets.
For site-specific crawlers, it's fair enough to use `focus_crawl` like this: ``` anemone.focus_crawl do |page| if page.doc page.doc.search('.//a[@href]').map { |a| URI.parse(a[:href]) } else page.links end end ``` However when using...
Fix for URI::InvalidURIError: bad URI(is not URI?). For example, URI('http://google.com/åäö') fails otherwise.
Also, add this model anemone_page.rb in your app ``` ruby class AnemonePage include Mongoid::Document field :url field :headers, type: Moped::BSON::Binary field :data, type: Moped::BSON::Binary field :body, type: Moped::BSON::Binary field :links,...
Hi! Thanks a lot for anemone guys! I missed a feature that I have implemented, basically the reverse of `skip_links_like`, which I have called `only_links_like` to help the crawler to...