scrapify issues

Support for Basic Authentication

2

Some pages require basic authentication. Add support for username, password before crawling pages ``` class Library html "http://mylibrary.com" username "foo" password "bar" end ```

sathish316

Support for Login with session

Support for crawling authenticated content. Use Mechanize instead of Nokogiri to login and retain session info before crawling content that requires authentication Example: ``` class Project html "http://jira.myproject.com" login "http://jira.myproject.com/login"...

sathish316

pagination using array of pages

Crawl data across pages by extracting from an array of html urls ``` class Book include Scrapify::Base html ["http://www.bookstore.com/fiction/1", "http://www.bookstore.com/fiction/2", "http://www.bookstore.com/fiction/3"] end ```

sathish316

Attribute data types

Support optional datatype for each attribute Example: ``` class IMDB attribute :title, type: String attribute :votes, type: Integer attribute :rating, type: Decimal attribute :released_data, type: DateTime end ```

sathish316

pagination using next page selector

All paginated content have some way of moving to next page. Get extra selector to fetch next page and crawl till last page is reached Example: ``` class Book include...

sathish316

pagination using placeholder for page and range/array of pages

Support crawling of paginated content using placeholder for page in html url and range or array of pages Example: ``` class Book include Scrapify::Base html "http://www.bookstore.com/fiction/:page", page: 1..100 end ```

sathish316

find all with conditions

Support for find all with conditions like: ``` IMDB.all(director: 'Christopher Nolan') ``` OR ``` IMDB.where(director: 'Christopher Nolan') ```

sathish316

Tolerance to malformed XML

1

Is Scrapify tolerant to errors in XML like unmached tags and so on? Cheers

franciscolourenco

scrapify
scrapify copied to clipboard

Metadata

Support for Basic Authentication

Support for Login with session

pagination using array of pages

Attribute data types

pagination using next page selector

pagination using placeholder for page and range/array of pages

find all with conditions

Tolerance to malformed XML

← Metadata

Owner

Metadata

scrapify scrapify copied to clipboard

Metadata

← Metadata

Owner

Metadata

scrapify
scrapify copied to clipboard