spidermon
spidermon copied to clipboard
feature: Added Cerberus as a new option for item validation
:hatching_chick: This brings an end to a great and fulfilling period of contributing to Spidermon and the Scrapy Project as part of Google Summer of Code 2019.
Google Summer of Code 2019 with The Scrapy Project
Project - Integrate Cerberus, solves #182, Project Description- Google Archive
Co-Org Admin - Cathal
Mentors - @rennerocha , @ejulio
Personal GSoC Blog - Mixster x GSoC
PSF Blog - https://blogs.python-gsoc.org/en/blogs/vipulgupta2048s-blog/
Description
- Spidermon is a recommended tool for monitoring spiders created using Scrapy. The user at the time can choose between two libraries for item validation rules: jsonschema and schematics. We want to provide a third option that being Cerberus.
- Cerberus provides powerful yet simple and lightweight data validation functionality out of the box and is designed to be easily extensible, allowing for custom validation. It has no dependencies and is thoroughly tested on several Python versions.
- The goal of this project was to integrate, test and enable Cerberus as a new option for item validation available for the user.
Deliverables & Work Done
- All Code of highest quality standards having detailed documentation, black styling and well tested (Pull request – https://github.com/scrapinghub/spidermon/pull/201) This Pull Request Includes:
- CerberusValidator() class for Item validation through Cerberus. (https://github.com/vipulgupta2048/spidermon/pull/2)
- Translator for translating errors for a better, unified system working with other validation methods. (https://github.com/vipulgupta2048/spidermon/pull/4)
- Complete integration with Scrapy pipelines, working with raw schema, URL’s, and paths. (https://github.com/vipulgupta2048/spidermon/pull/5)
- Unit + integration tests for each component in place.
- Documentation for Cerberus Validation method. (https://github.com/vipulgupta2048/spidermon/pull/6)
- A detailed, well-documented tutorial will be developed during the course of the summers implementing almost every feature of Spidermon to help developers as a reference and blogs will be written.
- One blog each week regarding Spidermon and my experience, learning through the project on Mixster x GSoC.
- For the community to track progress, a tracker was maintained with my latest developments containing week-to-week updates, and MoM of mentor meetings. This helps to maintain accountability, transparency and keeping track.
For system testing, one could go ahead and use the pre-configured Quotes spider https://github.com/vipulgupta2048/testing_quotes and installing Spidermon from the master branch of my fork.
Looking forward to improving the source code even further, through all your valued opinions, reviews, and comments. Would love to clarify and help understand the work done.
This project has been completed with long nights of reading and writing the code, learning new concepts on the fly and asking hundreds of pop-questions on Slack, that were answered duly by my mentors @ejulio @rennerocha as without their constant help, motivation, and guidance completing this uphill task wouldn't be ever possible.
@vipulgupta2048 , the build is failing. See the logs in travis reults to see the problems and update your PR.
We will not add a new item validation library as built-in in Spidermon now, so we will close this PR. If needed, it can be created as a separated package.