Sathish
Sathish
Some pages require basic authentication. Add support for username, password before crawling pages ``` class Library html "http://mylibrary.com" username "foo" password "bar" end ```
Support for crawling authenticated content. Use Mechanize instead of Nokogiri to login and retain session info before crawling content that requires authentication Example: ``` class Project html "http://jira.myproject.com" login "http://jira.myproject.com/login"...
Crawl data across pages by extracting from an array of html urls ``` class Book include Scrapify::Base html ["http://www.bookstore.com/fiction/1", "http://www.bookstore.com/fiction/2", "http://www.bookstore.com/fiction/3"] end ```
Support optional datatype for each attribute Example: ``` class IMDB attribute :title, type: String attribute :votes, type: Integer attribute :rating, type: Decimal attribute :released_data, type: DateTime end ```
All paginated content have some way of moving to next page. Get extra selector to fetch next page and crawl till last page is reached Example: ``` class Book include...
Support crawling of paginated content using placeholder for page in html url and range or array of pages Example: ``` class Book include Scrapify::Base html "http://www.bookstore.com/fiction/:page", page: 1..100 end ```
Support for find all with conditions like: ``` IMDB.all(director: 'Christopher Nolan') ``` OR ``` IMDB.where(director: 'Christopher Nolan') ```
It'll be great if i can keep running 'bundle exec pry-remote' in a separate terminal which gets invoked on binding.remote_pry calls. Currently, there is no way to start pry-remote in...
Is it possible to generate table in RTF with dynamic rows and cols?