opensearch-go icon indicating copy to clipboard operation
opensearch-go copied to clipboard

[BUG] Extreme latency in BulkIndexer

Open dokterbob opened this issue 3 years ago • 7 comments

What is the bug? It seems that with a BulkIndexer with 2 workers, I am getting unexpected latency on BulkIndexer.Add(). It seems that somehow the workers are not consuming the queue within any reasonable sort of timeframe, I'm seeing delays of over 20s!

For example, in the last hour I've 53 cases of >1s latency on just Add() out of a total of 174 calls.

How can one reproduce the bug? With 2 workers running, adding items from different goroutines and a relatively busy search cluster.

What is the expected behavior? Sub-millisecond latencies, basically the time it takes to shove something into a channel.

What is your host/environment?

  • OS: Ubuntu 20.02
  • Version: 1.1.0 (but nothing has changed to the bulkindexer since the fork from ES)

Do you have any screenshots? image

dokterbob avatar Jun 02 '22 19:06 dokterbob

Note; returning to the default of numCPU workers seems to alleviate the issue, but given the significant delays (seconds versus sub-millisecond) I would still strongly argue that there is an underlying issue here. At the very least I would suggest documenting this unexpected behaviour.

Please see the difference below: image

The issue seems reduced but it is still occurring! image

CPU load on this server is around 10% and the load average is around 4. There is still about 10% of Add() calls which takes >1s.

dokterbob avatar Jun 02 '22 19:06 dokterbob

@VijayanB @VachaShah Any ideas?

dokterbob avatar Jun 16 '22 15:06 dokterbob

Poke!

dokterbob avatar Jul 07 '22 07:07 dokterbob

@dokterbob This looks visible problematic, but doesn't look like the folks here got to looking into it. Let's try to move this forward? First, what's the easiest way to reproduce this (maybe post code similar to the benchmarks in this project)? Are you able to bulk load data a lot faster into this instance with other mechanisms (aka is this a client issue for sure)?

dblock avatar Jul 11 '22 18:07 dblock

@dokterbob Do you mind posting some of the code you were using to help pinpoint this issue?

APoolio avatar Aug 05 '22 21:08 APoolio

Sorry, I didn't see the messages. Code is https://github.com/ipfs-search/ipfs-search/ but of course you'll need a more detailed test case.

After increasing the workers it seems the problem has become less severe. Now that it's been picked up I'll see if I can get more concrete feedback over the next couple of weeks.

dokterbob avatar Oct 16 '22 14:10 dokterbob

@dokterbob hey, I want to work on solving the issue, how relevant is this? Are there any problems now, or was it on the client side?

zethuman avatar Apr 04 '23 01:04 zethuman