memtier_benchmark icon indicating copy to clipboard operation
memtier_benchmark copied to clipboard

memtier_benchmark doest not exit as expected

Open ymjzxw opened this issue 6 years ago • 27 comments

In my test enviroment, I got the following outputs: $ ./memtier_benchmark -s 192.168.111.126 -p 2001 -n 1 -c 1 -t 2 -d 20000 --cluster-mode [RUN #1] Preparing benchmark client... [RUN #1] Launching threads now... [RUN #1 100%, 0 secs] 2 threads: 2 ops, 6557 (avg: 6557) ops/sec, 125.38MB/sec (avg: 125.38MB/sec), 0.39 (avg: 0.39) msec latency

It will hang here like forever. By reading the code, it seems it should exit and output like this: [RUN #1] Preparing benchmark client... [RUN #1] Launching threads now... [RUN #1 100%, 0 secs] 0 threads: 2 ops, 0 (avg: 6756) ops/sec, 0.00KB/sec (avg: 129.20MB/sec), 0.00 (avg: 0.37) msec latency

2 Threads 1 Connections per thread 1 Requests per client

ALL STATS

Type Ops/sec Hits/sec Misses/sec MOVED/sec ASK/sec Latency KB/sec

Sets 6779.66 --- --- 0.00 0.00 0.37200 132746.29 Gets 0.00 0.00 0.00 0.00 0.00 0.00000 0.00 Waits 0.00 --- --- --- --- 0.00000 --- Totals 6779.66 0.00 0.00 0.00 0.00 0.37200 132746.29

Request Latency Distribution Type <= msec Percent

SET 0.350 50.00 SET 0.390 100.00

Why? My test environment is cluster mode.

ymjzxw avatar Oct 15 '19 10:10 ymjzxw

Following up on this issue... any updates? I'm facing the same problem

vasiloeth avatar Nov 28 '19 16:11 vasiloeth

Seems like the fix was removed from the mainline again. Any reason for this or was this by accident? #106

lc0305 avatar Feb 13 '21 20:02 lc0305

Hi, @lc0305 the line was removed when we handled "fix issue #128 zeros in ouptut". Do you still encounter the above "memtier_benchmark doest not exit as expected #94"?

YaacovHazan avatar Feb 14 '21 10:02 YaacovHazan

Yes it did not exit for me in cluster-mode when benchmarking against a remote host, but did exit when benchmarking against localhost. I added the line from the previous fix and recompiled memtier and it fixed the issue for me.

lc0305 avatar Feb 14 '21 10:02 lc0305

With the same command-line arguments? if not, can you share please the command line that you used?

YaacovHazan avatar Feb 14 '21 10:02 YaacovHazan

It did not exit with the default arguments except for --cluster-mode -s -p set to the remote IP address and port. benchmark ran on macOS 11 cluster ran on Debian 10 (Kernel 4.19)

lc0305 avatar Feb 14 '21 10:02 lc0305

@lc0305 can you please, run it with --debug and share the output?

YaacovHazan avatar Feb 15 '21 18:02 YaacovHazan

Get the same problem. Here the result with --debug Capture

Teddyhoang avatar Jun 22 '21 09:06 Teddyhoang

@Teddyhoang not sure I got it... it seems that in your example the test exit as expected, no?

YaacovHazan avatar Jun 22 '21 09:06 YaacovHazan

Hi there, I'm running into the same problem with it didn't exit running cluster mode. It stuck at 100%. Any idea on what causes it and potential fixes?

Thanks!

molian6 avatar Jun 29 '23 19:06 molian6

@molian6, which version do you use? The new 2.0.0 version includes a fix around cluster mode: https://github.com/RedisLabs/memtier_benchmark/pull/205

YaacovHazan avatar Jun 29 '23 20:06 YaacovHazan

I'm pulling the redislabs/memtier_benchmark:edge image, and looks like it's 2.0.0

docker run -it redislabs/memtier_benchmark:edge --version
memtier_benchmark 2.0.0
Copyright (C) 2011-2022 Redis Ltd.
This is free software.  You may redistribute copies of it under the terms of
the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.
There is NO WARRANTY, to the extent permitted by law.

molian6 avatar Jun 29 '23 21:06 molian6

@molian6 can you run it with --debug and paste the output?

YaacovHazan avatar Jul 01 '23 18:07 YaacovHazan

It's not stuck all the time, but about 30% of the runs. I managed to repro once with --debug. The command I ran is following, and I ran against a 3 shard redis cluster:

memtier_benchmark --server=<IP> --port=6379 --threads=10 --clients=100 --pipeline=1 --key-prefix=memtier- --key-minimum=1 --key-maximum=1000000 --cluster-mode --hide-histogram --debug --print-percentiles=50,90,95,99,99.9 --test-time=60 --ratio=1:3 --data-size=1024 --key-pattern=R:R

The behavior I observed is it hung there for 10+ minutes, but it would finally return with the correct stats report. I obfuscated the endpoint, but they are just the 3 node ip and port endpoint, but following it the last 100 logs I got for --debug

docker logs -n 100 ac002ff3ce32
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): +OK, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint B>: handled response (first line): $-1, 0 hits, 1 misses2MB/sec), 344820.22 (avg: 449.40) msec latency
shard_connection.cpp:442: server <endpoint B>: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint B> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint B>: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint B> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint B>: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint B> Done, no requests to send no response to wait for
shard_connection.cpp:508: <endpoint B> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint C>: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint C> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): $-1, 0 hits, 1 misses2MB/sec), 344820.22 (avg: 452.11) msec latency
shard_connection.cpp:442: server <endpoint A>: handled response (first line): +OK, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint B>: handled response (first line): $-1, 0 hits, 1 misses2MB/sec), 344820.22 (avg: 453.21) msec latency
shard_connection.cpp:442: server <endpoint B>: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint B> Done, no requests to send no response to wait for
shard_connection.cpp:508: <endpoint B> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint C>: handled response (first line): +OK, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint C> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint B>: handled response (first line): $-1, 0 hits, 1 misses2MB/sec), 344820.22 (avg: 454.86) msec latency
shard_connection.cpp:508: <endpoint B> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint C>: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint C> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): +OK, 0 hits, 1 misses2MB/sec), 344820.22 (avg: 455.96) msec latency
shard_connection.cpp:442: server <endpoint A>: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): +OK, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint B>: handled response (first line): +OK, 0 hits, 1 misses2MB/sec), 344820.22 (avg: 457.64) msec latency
shard_connection.cpp:442: server <endpoint B>: handled response (first line): +OK, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint B> Done, no requests to send no response to wait for
shard_connection.cpp:508: <endpoint B> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint C>: handled response (first line): $-1, 0 hits, 1 misses2MB/sec), 344820.22 (avg: 458.76) msec latency
shard_connection.cpp:508: <endpoint C> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): $-1, 0 hits, 1 misses2MB/sec), 344820.22 (avg: 459.32) msec latency
shard_connection.cpp:442: server <endpoint A>: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint C>: handled response (first line): $-1, 0 hits, 1 misses2MB/sec), 344820.22 (avg: 461.02) msec latency
shard_connection.cpp:508: <endpoint C> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): +OK, 0 hits, 1 misses2MB/sec), 344820.22 (avg: 461.58) msec latency
shard_connection.cpp:442: server <endpoint A>: handled response (first line): +OK, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): +OK, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): +OK, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): +OK, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint B>: handled response (first line): $-1, 0 hits, 1 misses2MB/sec), 344820.22 (avg: 465.59) msec latency
shard_connection.cpp:442: server <endpoint B>: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint B> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint B>: handled response (first line): +OK, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint B> Done, no requests to send no response to wait for
shard_connection.cpp:508: <endpoint B> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): +OK, 0 hits, 1 misses2MB/sec), 344820.22 (avg: 467.30) msec latency
shard_connection.cpp:442: server <endpoint A>: handled response (first line): +OK, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): +OK, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint C>: handled response (first line): $-1, 0 hits, 1 misses2MB/sec), 344820.22 (avg: 469.06) msec latency
shard_connection.cpp:508: <endpoint C> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint B>: handled response (first line): $-1, 0 hits, 1 misses2MB/sec), 344820.22 (avg: 469.64) msec latency
shard_connection.cpp:508: <endpoint B> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint B>: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint B> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint C>: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint C> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): +OK, 0 hits, 1 misses2MB/sec), 344820.22 (avg: 471.40) msec latency
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint C>: handled response (first line): $-1, 0 hits, 1 misses2MB/sec), 344820.22 (avg: 471.99) msec latency
shard_connection.cpp:508: <endpoint C> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint C>: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:508: <endpoint C> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint A>: handled response (first line): $-1, 0 hits, 1 misses2MB/sec), 344820.22 (avg: 473.18) msec latency
shard_connection.cpp:508: <endpoint A> Done, no requests to send no response to wait for
shard_connection.cpp:442: server <endpoint B>: handled response (first line): $-1, 0 hits, 1 misses2MB/sec), 344820.22 (avg: 473.81) msec latency
shard_connection.cpp:508: <endpoint B> Done, no requests to send no response to wait for

molian6 avatar Jul 05 '23 06:07 molian6

Maybe it's also worth mention that this happened more frequently when server drop the connections. It looks like to maybe some error handling cases we've missed.

connection dropped.
connection dropped.
connection dropped.
connection dropped.
connection dropped.
connection dropped.
connection dropped.cs] 10 threads:        5299 ops,     365 (avg:     256) ops/sec, 365.78MB/sec (avg: 256.30MB/sec), 250.15 (avg: 540.09) msec latency
connection dropped.
connection dropped.cs] 10 threads:        5651 ops,     351 (avg:     260) ops/sec, 351.85MB/sec (avg: 260.71MB/sec), 259.79 (avg: 522.63) msec latency
connection dropped.
connection dropped.cs] 10 threads:        6044 ops,     392 (avg:     266) ops/sec, 392.84MB/sec (avg: 266.54MB/sec), 217.75 (avg: 502.81) msec latency
connection dropped.cs] 10 threads:        6829 ops,     382 (avg:     276) ops/sec, 382.90MB/sec (avg: 276.74MB/sec), 247.52 (avg: 473.57) msec latency
connection dropped.
connection dropped.cs] 10 threads:        7672 ops,     445 (avg:     287) ops/sec, 445.86MB/sec (avg: 287.59MB/sec), 222.60 (avg: 448.10) msec latency
connection dropped.cs] 10 threads:        8038 ops,     365 (avg:     290) ops/sec, 365.91MB/sec (avg: 290.42MB/sec), 224.05 (avg: 437.90) msec latency
connection dropped.cs] 10 threads:        8794 ops,     379 (avg:     296) ops/sec, 379.91MB/sec (avg: 296.31MB/sec), 252.20 (avg: 421.99) msec latency
connection dropped.cs] 10 threads:        9531 ops,     358 (avg:     300) ops/sec, 358.91MB/sec (avg: 300.87MB/sec), 238.30 (avg: 407.40) msec latency
connection dropped.cs] 10 threads:       10384 ops,     432 (avg:     308) ops/sec, 432.83MB/sec (avg: 308.32MB/sec), 293.98 (avg: 400.08) msec latency
connection dropped.cs] 10 threads:       11451 ops,     345 (avg:     312) ops/sec, 345.85MB/sec (avg: 312.18MB/sec), 286.60 (avg: 386.75) msec latency
connection dropped.ecs] 10 threads:       24087 ops,       0 (avg:     287) ops/sec, 0.00KB/sec (avg: 287.74MB/sec),  -nan (avg: 299.19) msec latencycyy
connection dropped.ecs] 10 threads:       24087 ops,       0 (avg:     245) ops/sec, 0.00KB/sec (avg: 245.14MB/sec),  -nan (avg: 299.19) msec latency

molian6 avatar Jul 05 '23 22:07 molian6

Thanks, @molian6. I tried to reproduce it with no success. I'll try to see if playing with the server connection helps.

YaacovHazan avatar Jul 06 '23 09:07 YaacovHazan

Hi @molian6 see PR #230 , As you suggested, I did find an issue when a connection drops, and can cause memtier-benchmark to hang in cluster-mode.

YaacovHazan avatar Jul 09 '23 19:07 YaacovHazan

Thanks for the quick fix! I'll patch the PR and verify if we see issues resolved.

molian6 avatar Jul 10 '23 21:07 molian6

Patched your changes but still getting the same issue

Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection reset by peer72080 ops, 670237 (avg: 524905) ops/sec, 190.70MB/sec (avg: 149.35MB/sec), 18.29 (avg: 23.44) msec latency Connection error: Connection reset by peer Connection error: Connection reset by peer Connection error: Connection reset by peer Connection error: Connection reset by peer72080 ops, 670237 (avg: 524905) ops/sec, 190.70MB/sec (avg: 149.35MB/sec), 18.29 (avg: 23.44) msec latency Connection error: Connection reset by peer Connection error: Connection timed out157472084 ops, 670237 (avg: 524905) ops/sec, 190.70MB/sec (avg: 149.35MB/sec), 18.29 (avg: 23.46) msec latency Connection error: Connection timed out157472084 ops, 670237 (avg: 524905) ops/sec, 190.70MB/sec (avg: 149.35MB/sec), 18.29 (avg: 23.46) msec latency Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out157472084 ops, 670237 (avg: 524905) ops/sec, 190.70MB/sec (avg: 149.35MB/sec), 18.29 (avg: 23.46) msec latency Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out157472084 ops, 670237 (avg: 524905) ops/sec, 190.70MB/sec (avg: 149.35MB/sec), 18.29 (avg: 23.46) msec latency Connection error: Connection timed out157472084 ops, 670237 (avg: 524905) ops/sec, 190.70MB/sec (avg: 149.35MB/sec), 18.29 (avg: 23.46) msec latency Connection error: Connection timed out157472084 ops, 670237 (avg: 524905) ops/sec, 190.70MB/sec (avg: 149.35MB/sec), 18.29 (avg: 23.46) msec latency Connection error: Connection timed out157472084 ops, 670237 (avg: 524905) ops/sec, 190.70MB/sec (avg: 149.35MB/sec), 18.29 (avg: 23.46) msec latency Connection error: Connection timed out157472084 ops, 670237 (avg: 524905) ops/sec, 190.70MB/sec (avg: 149.35MB/sec), 18.29 (avg: 23.46) msec latency Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out157472084 ops, 670237 (avg: 524905) ops/sec, 190.70MB/sec (avg: 149.35MB/sec), 18.29 (avg: 23.46) msec latency Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out Connection error: Connection timed out157472084 ops, 670237 (avg: 524905) ops/sec, 190.70MB/sec (avg: 149.35MB/sec), 18.29 (avg: 23.46) msec latency Connection error: Connection timed out Connection error: Connection timed out

shailymittal avatar Jul 13 '23 20:07 shailymittal

Hi @shailymittal can you provide more details? is it fails constantly? command line & output with debug level can be useful

YaacovHazan avatar Jul 17 '23 17:07 YaacovHazan