spidermon
spidermon copied to clipboard
issue: Wrong previous jobs count in ZyteJobsComparisonMonitor
Background
The method to get the previous jobs used in ZyteJobsComparisonMonitor can be seen here:
https://github.com/scrapinghub/spidermon/blob/a7e195f951a4b9a4837943dff0b2ae727b76f237/spidermon/contrib/scrapy/monitors/monitors.py#L557-L579
where number_of_jobs's value is defined by the SPIDERMON_JOBS_COMPARISON setting.
Issue
Supposedly, we should get the SPIDERMON_JOBS_COMPARISON number of previous jobs to compute the average, as stated in the docs, but we might get more because of how those jobs are iterated.
Debugging
- The first
_jobs = client.spider.jobs.list(..., count=number_of_jobs)call will getnumber_of_jobsat most, as it is expected. - The function then keeps iterating, skipping 1000 jobs ahead (start+= 1000) to fetch
#number_of_jobsmore.- That 1000 number might come from the fact that
client.spider.jobs.list()doesn't allow the return of more than 1000 jobs, so the intention might have been to paginate the results to get everything. But because we only want#number_of_jobs(which should be less than 1000, otherwise the function breaks), then this causes us to fetch more jobs than requested instead.
- That 1000 number might come from the fact that
- It does so until there are no more jobs available to list
For example, if some spider has 3500 jobs that meet the requested criteria and SPIDERMON_JOBS_COMPARISON is set to 10, then this will happen:
Round #0: len(_jobs) == 10 (client.spider.jobs.list(start=0, count=10))Round #1: len(_jobs) == 20 (client.spider.jobs.list(start=1000, count=10))Round #2: len(_jobs) == 30 (client.spider.jobs.list(start=2000, count=10))Round #3: len(_jobs) == 40 (client.spider.jobs.list(start=3000, count=10))Round #5: stop (client.spider.jobs.list(start=4000, count=10))
Resulting in 40 previous jobs, when only ten were requested.
Expected Behaviour
Only SPIDERMON_JOBS_COMPARISON number of jobs should be fetched as previous jobs, and those should be the latest ones that meet the requested criteria.