spidermon
spidermon copied to clipboard
Spidermon fails to add `_validation` field to the item.
Ubuntu 22.04, Python 3.10.9, Spidermon 1.17.1
If an item defined with attrs library and this condition https://github.com/scrapinghub/spidermon/blob/master/spidermon/contrib/scrapy/pipelines.py#L140 is True, then it fails here https://github.com/scrapinghub/spidermon/blob/master/spidermon/contrib/scrapy/pipelines.py#L141 with this message:
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: 'ProductList' object does not support item assignment
This is a limitation on the Item definition as it don't support adding new fields.
In theory, using slots=False should work according to the attr dosc, however it didn't work for me.
Alternatively, adding the field _validation to the class should remove the error, however, you will get _validation automatically populated to None on each item which is not the best solution either.
On my experience I had to just convert from Attrs class to dict before spidermon validaton.
@VMRuiz I had a hope that with this PR we wouldn't need this conversion-to-dict pipeline anymore.
That class is just an interface to simplify read and write access to the different types of Item classes. It doesn't add any additional functionality.
I don't know if we could overcharge it to support additional fields not defined on the original item class.
Maybe we can start by capturing the exception and raising a warning instead.
cc. @Gallaecio @rennerocha
Alternatively, adding the field
_validationto the class should remove the error
I think this is the way to go. You either disable the creation of this _validation field, if Spidermon allows for that, or you add the field to your item class.
@Gallaecio Shall we close this ticket as wont fix or similar?
Well, I think there might be an action here even if we go with https://github.com/scrapinghub/spidermon/issues/379#issuecomment-1385349452.
- If the
_validationfield cannot be disabled without disabling validation itself (I don’t know if that is the case), then that’s something to support. - We should document this scenario, i.e. the need to add
_validationto an item class that does not support defining unknown fields, and the option to disable the generation of that field instead.
This can be turn off with SPIDERMON_VALIDATION_ADD_ERRORS_TO_ITEMS - https://spidermon.readthedocs.io/en/latest/item-validation.html#spidermon-validation-add-errors-to-items
We may expand the docs to explicitly state that if enabled, it will only work on items that have defined the _validation field or that support adding undefined fields, like dicts .
Edit: Additionally, SPIDERMON_VALIDATION_ERRORS_FIELD can be use to replace the _validation field with any other field name.
This problem was added by https://github.com/scrapinghub/spidermon/pull/358 and it is a backward incompatible change that we didn't inform our users that it was introduced. So we need to do something about it. Probably one new release that indicates that _validation field will only work if we are dealing with dict item.
We already have this situation in Scrapy related to files and files_urls field for the media pipelines (https://docs.scrapy.org/en/latest/topics/media-pipeline.html#usage-example). So adding the information that we need to specify _validation should be enough. However, I believe that we should put as a warning, not just in the middle of the text.