jesterj icon indicating copy to clipboard operation
jesterj copied to clipboard

Adaptive processing

Open nsoft opened this issue 2 years ago • 0 comments

As a precursor to #115 we will want to ensure that the current node is optimizing for the rate limiting step. The basic task is to identify steps that are always busy, and then allow more threads to be applied to that step.

One thought is to monitor what percentage of wall clock time is vs time spent waiting to take an item from the step's queue. Where wall/wait is large compared to other steps, there is a need for more processing power on that step. We need to think about relationship to other steps and have a threshold, because in some cases it is the document source that is limiting, and nothing should be adjusted in that case. Limits to avoid uncontrolled addition of threads and awareness of the available processors etc are of course needed. In no case should we be adding threads if we already have 2x (?) the threads of the available processing cores.

The current work around for this if one can identify the step manually, is to create a plan with a RoundRobinRouter feeding multiple copies of the step which then funnel back to the remaining steps. The obvious downside is needing to re-design your processing repeatedly for performance tuning, and the fact that you would want to have a different plan depending on what hardware was underlying.

nsoft avatar Feb 21 '23 18:02 nsoft