django-dynamic-scraper
django-dynamic-scraper copied to clipboard
* add formrequest version of spider and checker
Hi, could you please provide more context, what this pull request is for?
Hi @holgerd77 ,
This is about using the web authentication feature that Scrapy actually supports.
Here is an example: http://doc.scrapy.org/en/0.16/topics/request-response.html#request-usage-examples
To use both these new classes, the only thing you have to do is to use the new classes FormRequestDjangoSpider instead of DjangoSpider and FormRequestDjangoChecker instead of DjangoChecker and set a few parameters such as username, password and their respective input form names.
Here is a usage example:
spiders.py
from dynamic_scraper.spiders.django_spider import DjangoSpider, FormRequestDjangoSpider
from ave.models import NewsWebsite, Article, ArticleItem
class ArticleSpider(FormRequestDjangoSpider):
name = 'article_spider'
def __init__(self, *args, **kwargs):
self._set_ref_object(NewsWebsite, **kwargs)
self.scraper = self.ref_object.scraper
self.scrape_url = self.ref_object.url
self.scheduler_runtime = self.ref_object.scraper_runtime
self.scraped_obj_class = Article
self.scraped_obj_item_class = ArticleItem
kwargs['username'] = 'USERNAME'
kwargs['password'] = 'PASSWORD'
kwargs['username_form'] = 'username'
kwargs['password_form'] = 'password'
super(ArticleSpider, self).__init__(self, *args, **kwargs)
checkers.py
from dynamic_scraper.spiders.django_checker import DjangoChecker, FormRequestDjangoChecker
from ave.models import Article
class ArticleChecker(FormRequestDjangoChecker):
name = 'article_checker'
def __init__(self, *args, **kwargs):
self._set_ref_object(Article, **kwargs)
self.scraper = self.ref_object.news_website.scraper
self.scrape_url = self.ref_object.url
self.scheduler_runtime = self.ref_object.checker_runtime
kwargs['username'] = 'USERNAME'
kwargs['password'] = 'PASSWORD'
kwargs['username_form'] = 'username'
kwargs['password_form'] = 'password'
super(ArticleChecker, self).__init__(self, *args, **kwargs)
I guess we can improve it by moving parameters to the admin panel.
https://github.com/scrapy/loginform
I am using this and it works great.