spidermon icon indicating copy to clipboard operation
spidermon copied to clipboard

I had difficulty figuring out how to use jsonschema

Open andrewbaxter opened this issue 6 years ago • 2 comments

Following https://spidermon.readthedocs.io/en/latest/item-validation.html#with-json-schema I set up ITEM_PIPELINES then jumped down to the jsonschema section, but there are no settings listed there for how to use jsonschema. After that I scanned the rest of the page and set SPIDERMON_VALIDATION_SCHEMAS since it seemed relevant.

The jsonschema section links to https://spidermon.readthedocs.io/en/latest/getting-started.html but this doesn't have any additional information on setting up jsonschema.

Running the job I just get:

2019-04-05 19:51:53 [scrapy.middleware] WARNING: Disabled ItemValidationPipeline: No validators were found

All references to validator in the linked documentation seems to be something to do with schematics which AFAIK I'm not using.

Also for what it's worth I'd prefer an error (fatal) here since the project seems to be detectably misconfigured (pipeline enabled but required settings not set properly) and wouldn't want to accidentally run the spider without validation.

andrewbaxter avatar Apr 05 '19 11:04 andrewbaxter

The docs should be fine to update. But, the FATAL part is a bit tricky. The current behavior raises NotConfigured, which is the standard exception in scrapy for this behavior. So, should we raise some other exception, log a FATAL message there, or add something to scrapy to break if something is not configured?

ejulio avatar Oct 28 '19 17:10 ejulio

In this case I think someone who adds ItemValidationPipeline to the pipelines list presumably wants to perform validation - if they don't have any validators configured it's more likely a mistake than a signal that they don't want validation. IMO raising another exception that wouldn't be caught here to abort the job before it runs unexpectedly without validation sounds good to me.

If adding the pipeline but disabling it is an expected use case, it might be better to make that explicit with a separate ITEM_VALIDATION_ENABLED setting which triggers the NotConfigured exception.

andrewbaxter avatar Oct 29 '19 10:10 andrewbaxter