crawly
crawly copied to clipboard
Runtime spider creation
This PR aims to allow for the easy creation of new spiders at runtime, utilizing modules that define the Crawly.Spider
behaviour as spider templates.
Considered success if:
- Can create multiple spiders at runtime from the engine using the same spider template.
- Can stop the runtime spider from the engine.
- Can scrape items normally with a runtime spider
Stretch:
- Can provide initialization options to the runtime spider, which can then be consumed by the spider template
Initial discussion here: https://github.com/oltarasenko/crawly/issues/141#issuecomment-735245579
Main implementation notes:
- Use strings for spider names for referencing an instantiated spider
- allow a template option to specify a spider module to be used as a template.
@oltarasenko I can't replicate the circle ci tests failure on my local machine, for some reason. All tests are passing on my side. In any case, do have a look at the changes:
- Engine starts and stops spiders asyncronously now, and I've refactored the starting/stopping to be asyncronously triggered so that the engine does not block when spider info is retrieved by other processes down the tree.
- I've tried to consolidate the tests and the Engine/Manager APIs to make it more consistent, so some functions have been flagged for deprecation.
- I've added 2 local registries for worker pool and manager name referencing, so that it is decoupled from the spider tempalte module. They're supervised in the application.ex supervision tree.
I'll be adding in more docs and trying to hunt down the 3 failing tests. Let me know what you think