nutch icon indicating copy to clipboard operation
nutch copied to clipboard

NUTCH-2947 Fetcher: keep state of empty fetch queues unless queue feeder is finished

Open sebastian-nagel opened this issue 3 years ago • 3 comments

Fetcher: keep state of empty but stateful fetch queues unless queue feeder is finished in order to ensure politeness

  • next fetch time not yet reached
  • non-zero exception counter and queue feeder still adding new fetch items to queues

Only if the the queue feeder is finished and no more new fetch items are added, these queues can finally removed.

Note: this PR needs to be adapted to #728 (NUTCH-2946) or vice verse whichever is merged first. The state of queues needs also preserved in case fetcher.max.exceptions.per.queue == -1 but fetcher.exceptions.per.queue.delay != -1.

sebastian-nagel avatar May 03 '22 14:05 sebastian-nagel

@sebastian-nagel which PR do you want review on first?

lewismc avatar May 05 '22 22:05 lewismc

@sebastian-nagel which PR do you want review on first?

NUTCH-2946/#728 (Markus also reviewed it on Jira) - I'll update this PR as soon as the other is merged.

sebastian-nagel avatar May 06 '22 12:05 sebastian-nagel

Updated to be based on master branch after merging NUTCH-2946/#728. The state of a queue is also preserved if fetcher.exceptions.per.queue.delay > 0.0 (in the discussion of NUTCH-2946 with Markus we came to defining the delay in seconds using a float just as the other fetcher delays. Internally the fetcher handles all delays in milliseconds.

sebastian-nagel avatar May 19 '22 13:05 sebastian-nagel