deadoralive
deadoralive copied to clipboard
Multiprocessing
The link checker is waiting around a lot of the time:
- Waiting to get resource IDs or URLs from CKAN
- Waiting to see whether checking a link succeeds or fails
- Waiting when posting a result back to CKAN
- Waiting in order to not hit the same domain too frequently
Whenever a resource check task is waiting on any of the above, the link checker could be getting on with another resource check.
Note that we want to put in rate limiting so that when it has multiple URLs on the same domain to check, it doesn't hit that domain too many times too quickly. This rate limiting will have to work across processes.