scrapy icon indicating copy to clipboard operation
scrapy copied to clipboard

engine_started vs spider_opened

Open kmike opened this issue 10 years ago • 4 comments

I think we should explain what is the difference between engine_started and spider_opened signals better. There is now one engine per spider, so these signals look very similar. To make things worse it is documented they can fire in any order:

This signal may be fired after the spider_opened signal, depending on how the spider was started. So don’t rely on this signal getting fired before spider_opened.

I don't know what the difference is and when you want to use engine_started instead of spider_opened :) Any ideas?

kmike avatar Oct 03 '15 11:10 kmike

Crawler.crawl() creates an engine, opens the spider, then starts the engine (signals sent: spider_opened -> engine_started). Scrapy shell, however, creates and starts the engine, then opens the spider at the first call to fetch() (signals sent: engine_started -> spider_opened).

Maybe the difference is useful for hooking functionality into the shell?

jdemaeyer avatar Nov 09 '15 01:11 jdemaeyer

Also part of this issue: spider_closed vs engine_stopped

Digenis avatar Dec 09 '15 11:12 Digenis

Do we have any documentation specifically around spider_closed vs engine_stopped. I am running my spiders with scrapyd and have a case where I need to export the spider process log file. I have written an extension that connects to the below signals in the same order where I do some database operations and data uploads to S3.

crawler.signals.connect(extension.spider_opened_handler, signal=signals.spider_opened)
crawler.signals.connect(extension.spider_closed_handler, signal=signals.spider_closed)
crawler.signals.connect(extension.engine_stopped_handler, signal=signals.engine_stopped)

At the moment, I am uploading the log file to s3 in the engine_stopped signal's handler in my extension which seems to work. I am assuming that this signal gets fired only after the spider_closed signal is fired and all of its handlers are executed. Would this be the expected behavior?

Is there any case where spider_closed can get fired after engine_stopped? I found a relevant ticket but the below was not clear to me:

  • order in which signals are fired
  • whether handlers completely execute before the next signal gets fired

shaunak-cisco avatar Sep 06 '24 07:09 shaunak-cisco

At a glance, it seems to me that indeed spider_closed and all handlers are resolved, and then engine_stopped and its handlers are resolved. Not sure if there are corner cases, though.

Gallaecio avatar Sep 09 '24 07:09 Gallaecio