crawly icon indicating copy to clipboard operation
crawly copied to clipboard

Runtime spider creation

Open Ziinc opened this issue 4 years ago • 1 comments

This PR aims to allow for the easy creation of new spiders at runtime, utilizing modules that define the Crawly.Spider behaviour as spider templates.

Considered success if:

  1. Can create multiple spiders at runtime from the engine using the same spider template.
  2. Can stop the runtime spider from the engine.
  3. Can scrape items normally with a runtime spider

Stretch:

  • Can provide initialization options to the runtime spider, which can then be consumed by the spider template

Initial discussion here: https://github.com/oltarasenko/crawly/issues/141#issuecomment-735245579

Main implementation notes:

  • Use strings for spider names for referencing an instantiated spider
  • allow a template option to specify a spider module to be used as a template.

Ziinc avatar Dec 10 '20 09:12 Ziinc

@oltarasenko I can't replicate the circle ci tests failure on my local machine, for some reason. All tests are passing on my side. In any case, do have a look at the changes:

  1. Engine starts and stops spiders asyncronously now, and I've refactored the starting/stopping to be asyncronously triggered so that the engine does not block when spider info is retrieved by other processes down the tree.
  2. I've tried to consolidate the tests and the Engine/Manager APIs to make it more consistent, so some functions have been flagged for deprecation.
  3. I've added 2 local registries for worker pool and manager name referencing, so that it is decoupled from the spider tempalte module. They're supervised in the application.ex supervision tree.

I'll be adding in more docs and trying to hunt down the 3 failing tests. Let me know what you think

Ziinc avatar Feb 03 '21 17:02 Ziinc