platform
platform copied to clipboard
Toggle batching for normal running of the pipeline
In normal running, I think the batch consumption of messages (batchRetrieveFlow in router, id minter, inference manager, matcher) is an unnecessary bottleneck.
The rate of arrival appears to be less than a few tens every few minutes, with occasional spikes up to about 300 or 400 (Where I assume some automated process kicks off)
I suspect that if we simply process the messages as they come it, it would be a lot more responsive and reliable.
One concern is that we might want a longer queue timeout for when things do get spiky.
It might be better to switch behaviour entirely for reindexing vs normal running, rather than try to come up with a timeout/batchsize configuration that works for both.
Normal running - process each message as it comes Reindexing - bundle into large batches
Even its value in the Batcher is questionable, it currently waits for a minute, and often only processes one at a time, rarely more than ten.
Originally posted by @paul-butcher in https://github.com/wellcomecollection/platform/issues/5463#issuecomment-1082795647