nutch
nutch copied to clipboard
NUTCH-2947 Fetcher: keep state of empty fetch queues unless queue feeder is finished
Fetcher: keep state of empty but stateful fetch queues unless queue feeder is finished in order to ensure politeness
- next fetch time not yet reached
- non-zero exception counter and queue feeder still adding new fetch items to queues
Only if the the queue feeder is finished and no more new fetch items are added, these queues can finally removed.
Note: this PR needs to be adapted to #728 (NUTCH-2946) or vice verse whichever is merged first. The state of queues needs also preserved in case fetcher.max.exceptions.per.queue == -1 but fetcher.exceptions.per.queue.delay != -1.
@sebastian-nagel which PR do you want review on first?
@sebastian-nagel which PR do you want review on first?
NUTCH-2946/#728 (Markus also reviewed it on Jira) - I'll update this PR as soon as the other is merged.
Updated to be based on master branch after merging NUTCH-2946/#728. The state of a queue is also preserved if fetcher.exceptions.per.queue.delay > 0.0 (in the discussion of NUTCH-2946 with Markus we came to defining the delay in seconds using a float just as the other fetcher delays. Internally the fetcher handles all delays in milliseconds.