scrapify icon indicating copy to clipboard operation
scrapify copied to clipboard

ScrApify is a library to build APIs by scraping static sites and use data as models or JSON APIs. It powers APIfy which is used to create JSON APIs from any html or wikipedia page

Results 8 scrapify issues
Sort by recently updated
recently updated
newest added

Some pages require basic authentication. Add support for username, password before crawling pages ``` class Library html "http://mylibrary.com" username "foo" password "bar" end ```

Support for crawling authenticated content. Use Mechanize instead of Nokogiri to login and retain session info before crawling content that requires authentication Example: ``` class Project html "http://jira.myproject.com" login "http://jira.myproject.com/login"...

Crawl data across pages by extracting from an array of html urls ``` class Book include Scrapify::Base html ["http://www.bookstore.com/fiction/1", "http://www.bookstore.com/fiction/2", "http://www.bookstore.com/fiction/3"] end ```

Support optional datatype for each attribute Example: ``` class IMDB attribute :title, type: String attribute :votes, type: Integer attribute :rating, type: Decimal attribute :released_data, type: DateTime end ```

All paginated content have some way of moving to next page. Get extra selector to fetch next page and crawl till last page is reached Example: ``` class Book include...

Support crawling of paginated content using placeholder for page in html url and range or array of pages Example: ``` class Book include Scrapify::Base html "http://www.bookstore.com/fiction/:page", page: 1..100 end ```

Support for find all with conditions like: ``` IMDB.all(director: 'Christopher Nolan') ``` OR ``` IMDB.where(director: 'Christopher Nolan') ```

Is Scrapify tolerant to errors in XML like unmached tags and so on? Cheers