Exceptions in middleware don't return exit code 1 in `scrapy crawl` & `scrapy check`
Description
If a middleware raises an exception, running scrapy crawl or scrapy check raises the exception to the shell but returns with exit code 0, instead of the expected 1.
Steps to Reproduce
-
Set up the tutorial up to here
-
Create a minimal middleware raising an issue in
middlewares.py
class BreakingMiddleware:
def __init__(self):
raise Exception("uhoh")
- Add the middleware to the
quotesspider and a contract for theparsefunction
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
custom_settings = {
"SPIDER_MIDDLEWARES" : {
"tutorial.middlewares.BreakingMiddleware": 100,
}
}
def start_requests(self):
urls = [
"http://quotes.toscrape.com/page/1/",
"http://quotes.toscrape.com/page/2/",
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
"""
@url http://quotes.toscrape.com/page/1/
@returns items 10 10
@returns requests 10 10
"""
page = response.url.split("/")[-2]
filename = "quotes-%s.html" % page
with open(filename, "wb") as f:
f.write(response.body)
self.log("Saved file %s" % filename)
-
Execute
scrapy checkorscrapy crawl quotes -
Execute
echo $?
Expected behavior:
Exit code 1
Actual behavior:
Exit code 0
Reproduces how often: 100%
Versions
Scrapy : 1.8.0
lxml : 4.4.2.0
libxml2 : 2.9.4
cssselect : 1.1.0
parsel : 1.5.2
w3lib : 1.21.0
Twisted : 19.10.0
Python : 3.8.0 (default, Nov 26 2019, 14:40:47) - [Clang 10.0.1 (clang-1001.0.46.4)]
pyOpenSSL : 19.1.0 (OpenSSL 1.1.1d 10 Sep 2019)
cryptography : 2.8
Platform : macOS-10.15.2-x86_64-i386-64bit
Additional context
scrapy check logs:
----------------------------------------------------------------------
Ran 0 contracts in 0.000s
OK
Unhandled error in Deferred:
Traceback (most recent call last):
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/crawler.py", line 184, in crawl
return self._crawl(crawler, *args, **kwargs)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/crawler.py", line 188, in _crawl
d = crawler.crawl(*args, **kwargs)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator
return _cancellableInlineCallbacks(gen)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks
_inlineCallbacks(None, g, status)
--- <exception caught here> ---
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/crawler.py", line 86, in crawl
self.engine = self._create_engine()
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/crawler.py", line 111, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/core/engine.py", line 70, in __init__
self.scraper = Scraper(crawler)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/core/scraper.py", line 69, in __init__
self.spidermw = SpiderMiddlewareManager.from_crawler(crawler)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/middleware.py", line 53, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/middleware.py", line 35, in from_settings
mw = create_instance(mwcls, settings, crawler)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/utils/misc.py", line 146, in create_instance
return objcls(*args, **kwargs)
File "/Users/dpf/Public/break-scrapy-check/tutorial/tutorial/middlewares.py", line 10, in __init__
raise Exception("uhoh")
builtins.Exception: uhoh
scrapy crawl quotes logs:
2020-01-28 11:29:40 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: tutorial)
2020-01-28 11:29:40 [scrapy.utils.log] INFO: Versions: lxml 4.4.2.0, libxml2 2.9.4, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.10.0, Python 3.8.0 (default, Nov 26 2019, 14:40:47) - [Clang 10.0.1 (clang-1001.0.46.4)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1d 10 Sep 2019), cryptography 2.8, Platform macOS-10.15.2-x86_64-i386-64bit
2020-01-28 11:29:40 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'tutorial', 'NEWSPIDER_MODULE': 'tutorial.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['tutorial.spiders']}
2020-01-28 11:29:40 [scrapy.extensions.telnet] INFO: Telnet Password: c7073899ef38fd40
2020-01-28 11:29:40 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats']
2020-01-28 11:29:40 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
Unhandled error in Deferred:
2020-01-28 11:29:40 [twisted] CRITICAL: Unhandled error in Deferred:
Traceback (most recent call last):
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/crawler.py", line 184, in crawl
return self._crawl(crawler, *args, **kwargs)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/crawler.py", line 188, in _crawl
d = crawler.crawl(*args, **kwargs)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator
return _cancellableInlineCallbacks(gen)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks
_inlineCallbacks(None, g, status)
--- <exception caught here> ---
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/crawler.py", line 86, in crawl
self.engine = self._create_engine()
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/crawler.py", line 111, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/core/engine.py", line 70, in __init__
self.scraper = Scraper(crawler)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/core/scraper.py", line 69, in __init__
self.spidermw = SpiderMiddlewareManager.from_crawler(crawler)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/middleware.py", line 53, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/middleware.py", line 35, in from_settings
mw = create_instance(mwcls, settings, crawler)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/utils/misc.py", line 146, in create_instance
return objcls(*args, **kwargs)
File "/Users/dpf/Public/break-scrapy-check/tutorial/tutorial/middlewares.py", line 10, in __init__
raise Exception("uhoh")
builtins.Exception: uhoh
2020-01-28 11:29:40 [twisted] CRITICAL:
Traceback (most recent call last):
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/crawler.py", line 86, in crawl
self.engine = self._create_engine()
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/crawler.py", line 111, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/core/engine.py", line 70, in __init__
self.scraper = Scraper(crawler)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/core/scraper.py", line 69, in __init__
self.spidermw = SpiderMiddlewareManager.from_crawler(crawler)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/middleware.py", line 53, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/middleware.py", line 35, in from_settings
mw = create_instance(mwcls, settings, crawler)
File "/Users/dpf/Public/break-scrapy-check/py3/lib/python3.8/site-packages/scrapy/utils/misc.py", line 146, in create_instance
return objcls(*args, **kwargs)
File "/Users/dpf/Public/break-scrapy-check/tutorial/tutorial/middlewares.py", line 10, in __init__
raise Exception("uhoh")
Exception: uhoh
I think that's intended, because a crawl doesn't stop on these exceptions. That's the same as with exceptions in the request callbacks - they're logged, but crawl continues.
What about scrapy check returning 0 even if the check could not be performed?
Finding the same issue. scrapy check fails but exist code is still 0. Hence when used in CI/CD (for us Bitbucket Pipelines) the error goes unnoticed.
Have you found a way to solve that?
Any update on this? I think scrapy should be able to be configured to return an exit code 1, since it told orchestrators like airflow that something went wrong, and therefore stops the dependant tasks.