scrapy-inline-requests
scrapy-inline-requests copied to clipboard
Yielding requests with callbacks
Since version 3.0 there are restrictions if Request from generator has callback/errback. Why is it like this? What is the reason for this change?
I have some spiders like this
# -*- coding: utf-8 -*-
import json
import scrapy
from inline_requests import inline_requests
class toscrapecssspider(scrapy.spider):
name = "toscrape-css"
start_urls = [
'http://quotes.toscrape.com/',
]
@inline_requests
def parse(self, response):
some_data = yield scrapy.request('http://httpbin.org/headers')
print(json.loads(some_data.body))
next_page_url = response.css("li.next > a::attr(href)").extract_first()
if next_page_url is not none:
yield scrapy.request(response.urljoin(next_page_url), callback=self.parse_page)
def parse_page(self, response):
print(response.url)
print("hello")
This still works fine, but prints warnings
2017-12-28 12:33:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://httpbin.org/headers> (referer: http://quotes.toscrape.com/)
{u'headers': {u'Accept-Language': u'en', u'Accept-Encoding': u'gzip,deflate,br', u'Host': u'httpbin.org', u'Accept': u'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', u'User-Agent': u'Scrapy/1.4.0 (+http://scrapy.org)', u'Connection': u'close', u'Referer': u'http://quotes.toscrape.com/'}}
2017-12-28 12:33:05 [py.warnings] WARNING: /home/pawel/.virtualenvs/scrapy/local/lib/python2.7/site-packages/inline_requests/generator.py:59: UserWarning: Got a request with callback set, bypassing the generator wrapper. Generator may not be able to resume. <GET http://quotes.toscrape.com/page/2/>
"be able to resume. %s" % ret)
2017-12-28 12:33:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/2/> (referer: http://httpbin.org/headers)
http://quotes.toscrape.com/page/2/
hello
What can happen if generator may not be able to resume? Is there some way to preserve behavior from before 3.0 and skip warnings?
@rmax