django-dynamic-scraper icon indicating copy to clipboard operation
django-dynamic-scraper copied to clipboard

* add formrequest version of spider and checker

Open canercandan opened this issue 10 years ago • 3 comments

canercandan avatar Dec 17 '14 15:12 canercandan

Hi, could you please provide more context, what this pull request is for?

holgerd77 avatar Dec 17 '14 15:12 holgerd77

Hi @holgerd77 ,

This is about using the web authentication feature that Scrapy actually supports.

Here is an example: http://doc.scrapy.org/en/0.16/topics/request-response.html#request-usage-examples

To use both these new classes, the only thing you have to do is to use the new classes FormRequestDjangoSpider instead of DjangoSpider and FormRequestDjangoChecker instead of DjangoChecker and set a few parameters such as username, password and their respective input form names.

Here is a usage example:

spiders.py

from dynamic_scraper.spiders.django_spider import DjangoSpider, FormRequestDjangoSpider
from ave.models import NewsWebsite, Article, ArticleItem

class ArticleSpider(FormRequestDjangoSpider):

    name = 'article_spider'

    def __init__(self, *args, **kwargs):
        self._set_ref_object(NewsWebsite, **kwargs)
        self.scraper = self.ref_object.scraper
        self.scrape_url = self.ref_object.url
        self.scheduler_runtime = self.ref_object.scraper_runtime
        self.scraped_obj_class = Article
        self.scraped_obj_item_class = ArticleItem

        kwargs['username'] = 'USERNAME'
        kwargs['password'] = 'PASSWORD'
        kwargs['username_form'] = 'username'
        kwargs['password_form'] = 'password'

        super(ArticleSpider, self).__init__(self, *args, **kwargs)

checkers.py

from dynamic_scraper.spiders.django_checker import DjangoChecker, FormRequestDjangoChecker
from ave.models import Article

class ArticleChecker(FormRequestDjangoChecker):

    name = 'article_checker'

    def __init__(self, *args, **kwargs):
        self._set_ref_object(Article, **kwargs)
        self.scraper = self.ref_object.news_website.scraper
        self.scrape_url = self.ref_object.url
        self.scheduler_runtime = self.ref_object.checker_runtime

        kwargs['username'] = 'USERNAME'
        kwargs['password'] = 'PASSWORD'
        kwargs['username_form'] = 'username'
        kwargs['password_form'] = 'password'

        super(ArticleChecker, self).__init__(self, *args, **kwargs)

I guess we can improve it by moving parameters to the admin panel.

canercandan avatar Dec 17 '14 15:12 canercandan

https://github.com/scrapy/loginform

I am using this and it works great.

umrashrf avatar Oct 01 '19 20:10 umrashrf