graboid icon indicating copy to clipboard operation
graboid copied to clipboard

Simply awesome web scraping with Nokogiri

Graboid

Graboid

Simply awesome web scraping. Better docs later. See specs.

0.3.4 Update

http://twoism.posterous.com/new-graboid-dsl

Installation

gem install nokogiri graboid

Usage

%w{rubygems graboid}.each { |f| require f }

class RedditEntry
  include Graboid::Scraper

  selector '.entry'

  set :title
  set :domain, :selector => '.domain a'
  
  set :link,   :selector => '.title' do |entry| 
    entry.css('a').first['href'] 
  end

  page_with do |doc|
    doc.css('p.nextprev a').select{|a| a.text =~ /next/i  }.first['href']
  end

  before_paginate do
    puts "opening page: #{self.source}"
    puts "collection size: #{self.collection.length}"
    puts "#{"*"*100}"
  end

end

@posts = RedditEntry.new( :source => 'http://reddit.com' ).all( :max_pages => 2 )

@posts.each do |p| 
  puts "title: #{p.title}"
  puts "domain: #{p.domain}"
  puts "link: #{p.link}"
  puts "#{"*"*100}"
end

##Note on Patches/Pull Requests

  • Fork the project.
  • Make your feature addition or bug fix.
  • Add tests for it. This is important so I don't break it in a future version unintentionally.
  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
  • Send me a pull request. Bonus points for topic branches.

Copyright

Copyright (c) 2010 Christopher Burnett. See LICENSE for details.