elasticsearch-net icon indicating copy to clipboard operation
elasticsearch-net copied to clipboard

Intermittent slow requests from NEST

Open kierenj opened this issue 3 years ago • 3 comments

NEST/Elasticsearch.Net version: 7.13.2

Elasticsearch version: 7.16.1

.NET runtime version: .NET 5.0

Operating system version: Debian GNU/Linux 10 (buster)

Description of the problem including expected versus actual behavior: More than 99% of queries of all sorts (search, scroll, get document) run in <100ms, usually a lot less than that. Occassionally however, the NEST client takes longer, sometimes a lot longer - 1 second, 3 seconds, or more.

Slowlog is configured on the server to log anything above 0.1 sec, and even in cases where NEST reports a HTTP request to elasticsearch taking >3sec, nothing is logged, so I believe it's client-side.

The client and server are on the same Kubernetes cluster. CPU and RAM use is low on both client and server.

I've monitored for thread pool exhaustion, and at one point saw a thread pool queue size go high for a delay, but added a SetMinThreads call, and not seen much of a queue size since, but the issue persists.

Here's a recent example:

image

I went so far as to capture a packet trace, which shows a different example - whereupon NEST took about 1 second to do the request, but nothing on the slowlog:

image

I believe this shows the client making a TCP connection, but then waiting for almost a second before sending the request.

I'll admit I'm not au fait with the NEST/ES.NET codebase, so given an hour or so of digging, I couldn't make my way to find where it is that HTTP requests are issued! So I can't understand if this even could be a bug in NEST/ES.NET, or if it's a .NET thing.

I've also (I think) ruled out network connectivity by running a bash script on the same host to replay a request which triggered this issue via NEST to the server in a tight loop. After thousands of iterations, nothing took more than 100ms.

Steps to reproduce: Currently, this occurs intermittently in the test build of our app - with only one user but an intermittent workload profile similar to the above kibana screenshot.

Expected behavior Requests don't experience random pauses.

Provide ConnectionSettings (if relevant):

Provide DebugInformation (if relevant):

kierenj avatar Mar 25 '22 09:03 kierenj

Hi, @kierenj.

For HTTP itself we depend on the .NET stack, but of course, there could be delays elsewhere in the client. The first question is, did you configure Wireshark to show the time since the previous packet? Otherwise, that 1s could be to send the POST itself.

The slow log may only help if the search itself is slow. I don't believe it accounts for HTTP elements. Enabling the REST request tracer may be more useful for collecting extra data from the server-side.

On the client, you could look to listen into the diagnostics source events we emit that would include details on more of the internals for serialization and HTTP. You may also want to listen to various .NET diagnostics as well. Our APM tool would help show spans for some of these pieces also.

stevejgordon avatar Mar 28 '22 09:03 stevejgordon

Re the packet trace, it shows that a HTTP connection was started and established quickly (0.7ms), then the delay was with the client, i.e. >940ms of waiting. The "0" time is just when the first packet to start establishing the TCP connection occurred. To rephrase the packet trace:

  1. Client sends first SYN packet to establish connection with server
  2. (0.7ms passes)
  3. Server responds back to client with SYN/ACK, to say connection will be accepted
  4. (0.02ms passes)
  5. Client sends ACK back to server. At this point, TCP connection is established and both sides know it
  6. (945ms inexplicably passes)
  7. Client sends HTTP request to server

So, the delay is all client-side.

I'll try using dotnet-trace to listen for DiagnosticSource events.. thanks!

kierenj avatar Mar 28 '22 14:03 kierenj

Hi @kierenj,

I know it's been a while, but did you find the cause of those intermittent slow requests? We're experiencing the same when targeting an Elastic Cloud deployment (version 7.17.5). Fast requests when using an HttpClient (<100ms) in a loop (waiting a small delay between calls). When doing the same with Nest, most calls take <100ms, but some take up to 12 seconds.

We've tried playing with different parameters in ConnectionSettings, with no luck.

There is one scenario where my app sends two requests, one to get aggregates and one to make the actual search. I noticed that if I create 2 tasks and then await Task.WhenAll(aggregateTask, searchTask), I hit the slow request problem much more often than if I await each request sequentially (obviously it slows down the overall operation a bit).

Any details/hint you remember would be greatly appreciated.

Thanks!

afmercure avatar Aug 12 '22 19:08 afmercure

Hello, any updates on this? We face the same issue using the NEST library (7.10) to query Elasticsearch hosted in AWS. The majority of requests are without issues. However, some of them hang for many seconds up to the timeout (30 s). There are no entries in the Elasticsearch slow logs, so it seems the issue is library related.

mgjam avatar Dec 08 '22 20:12 mgjam

@mgjam This hasn't been reproduced on our end, and we cannot take this further without a repeatable reproduction. When we have investigated delays with other customers, it has been generally due to server response times or networking between client and server.

What server version are you running in your case, and is it self-managed? Note that our client does not support the Amazon OpenSearch Service (previously named Amazon Elasticsearch Service). Have you investigated the AWS networking stack between the client and server? Have you tried using the latest client release, which we always recommend to get any bug fixes and performance enhancements?

If you can listen to, collect and share the diagnostics source events the client emits, that would include details that might help pinpoint where delays are occurring if this is client-side.

stevejgordon avatar Dec 13 '22 13:12 stevejgordon

Closing as stale.

stevejgordon avatar Apr 11 '23 09:04 stevejgordon