platform Toggle batching for normal running of the pipeline

Toggle batching for normal running of the pipeline

Open paul-butcher opened this issue 2 years ago • 5 comments

In normal running, I think the batch consumption of messages (batchRetrieveFlow in router, id minter, inference manager, matcher) is an unnecessary bottleneck.

The rate of arrival appears to be less than a few tens every few minutes, with occasional spikes up to about 300 or 400 (Where I assume some automated process kicks off)

I suspect that if we simply process the messages as they come it, it would be a lot more responsive and reliable.

One concern is that we might want a longer queue timeout for when things do get spiky.

It might be better to switch behaviour entirely for reindexing vs normal running, rather than try to come up with a timeout/batchsize configuration that works for both.

Normal running - process each message as it comes Reindexing - bundle into large batches

Even its value in the Batcher is questionable, it currently waits for a minute, and often only processes one at a time, rarely more than ten.

Originally posted by @paul-butcher in https://github.com/wellcomecollection/platform/issues/5463#issuecomment-1082795647

Mar 31 '22 10:03 paul-butcher

platform platform copied to clipboard

Toggle batching for normal running of the pipeline

platform
platform copied to clipboard