spidermon icon indicating copy to clipboard operation
spidermon copied to clipboard

Scrapy Extension for monitoring spiders execution.

Results 86 spidermon issues
Sort by recently updated
recently updated
newest added

Closes #416 Adds a setting to limit how deep the coverage is computed for nested dicts.

``` File "/Users/rvandam/.venv/ADD1D703-D26A-4DD9-BEE9-DA619F3B2F68/lib/python3.11/site-packages/scrapy/utils/defer.py", line 348, in maybeDeferred_coro result = f(*args, **kw) File "/Users/rvandam/.venv/ADD1D703-D26A-4DD9-BEE9-DA619F3B2F68/lib/python3.11/site-packages/pydispatch/robustapply.py", line 55, in robustApply return receiver(*arguments, **named) File "/Users/rvandam/.venv/ADD1D703-D26A-4DD9-BEE9-DA619F3B2F68/lib/python3.11/site-packages/spidermon/contrib/scrapy/extensions.py", line 128, in spider_closed self._add_field_coverage_to_stats() File "/Users/rvandam/.venv/ADD1D703-D26A-4DD9-BEE9-DA619F3B2F68/lib/python3.11/site-packages/spidermon/contrib/scrapy/extensions.py", line...

Sometimes the monitor raises issues if the coverage is slightly less than the desired which if rounded to 2 decimal places makes it fine. This PR fixes that issue Example...

We ran into a problem where another extension required our settings to be json serializable which was barfing on the class names used for specifying json schemas. I'm separately working...

Closes #423 Error handling adds timezone if needed.

Encountered this error while playing with [PeriodicItemCountMonitor](https://github.com/scrapinghub/spidermon/blob/377c4ab929a1f49bc2f78787ae2d7fa615948583/spidermon/contrib/scrapy/monitors/monitors.py#L610C7-L610C31) ``` ====================================================================== ERROR: Periodic Item Count Increase Monitor/test_stat_monitor ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/vandamr/.venv/46205CF6-02DD-499A-9EB8-71D8CB0FEA73/lib/python3.11/site-packages/spidermon/contrib/scrapy/monitors/base.py", line 213, in test_stat_monitor threshold = self._get_threshold_value()...

In the current version of scrapy, the code below breaks due to spider start time being timezone aware https://github.com/scrapinghub/spidermon/blob/master/spidermon/contrib/scrapy/monitors/monitors.py#L497 ```python now = datetime.datetime.utcnow() start_time = self.data.stats.get("start_time") duration = now -...

## Background Currently, the coverage monitor tracks and reports the coverage of all fields, including nested fields (i.e., keys inside dictionary values specifically). It follows all nested field levels. This...

After https://github.com/scrapinghub/spidermon/pull/358, the validation of date fields using `jsonschema` is not working as before. Spidermon was serializing date fields into strings (https://github.com/scrapinghub/spidermon/pull/358/files#diff-7937ac85a30630fe837b9c133f4459ee590680bb5dfce72775db6005f2b45f51L142), so when injected into jsonschema validators, the `date`...