valkey icon indicating copy to clipboard operation
valkey copied to clipboard

[Test Failure] Socket timeout for AOF test

Open madolson opened this issue 10 months ago • 5 comments

Error

sock56048fcc87c0 => (SPAWNED SERVER) pid:38170 - tests/integration/aof.tcl

We are seeing a fairly consistent failure like https://github.com/valkey-io/valkey/actions/runs/12919448042/job/36029989392#step:8:6146.

In this case, the client connecting the AOF server is hanging after starting the process but before executing the test. We can see that Turning appendonly on and off within a transaction finished, so it must be failing on Test cluster slots / cluster shards in aof won't crash which is the test that comes after. I really don't have a great read on this, but there seems to be a few occurrences . Another example is https://github.com/valkey-io/valkey/actions/runs/12940498740/job/36094850711#step:8:6290.

@enjoy-binbin I see you added the test, https://github.com/valkey-io/valkey/commit/80fcbd3fece6decb6195dc1a4289ebb675b5d45d. Maybe you have some thoughts on it.

madolson avatar Jan 24 '25 13:01 madolson

Not sure if it has anything to do with it Starting server on 127.0.0.1:23206 Port 23206 was already busy, trying another port..., i don't have other clues right now.

enjoy-binbin avatar Jan 25 '25 07:01 enjoy-binbin

most recent: https://github.com/valkey-io/valkey/actions/runs/14805407197/job/41572746840#step:8:6290

roshkhatri avatar May 06 '25 23:05 roshkhatri

This seams like a containerized test issue. I was seeing the same errors while running all the test on local. Ran the test in a loop like this

./runtest --list-tests | while read -r test; do\
 echo "Running test: $test"; \
./runtest --single "$test" --verbose --accurate --dump-logs; \
done && \
echo "All tests completed!"

~~Which runs successfully.~~ Which also fails .

We would want to skip these tests

I also tried running the integration/aof.tcl separately. It does not crash/fail/timeout.

roshkhatri avatar May 07 '25 21:05 roshkhatri

Please disregard other comments. I tested this test on clean machine with ipv6 enabled,

the test doesnt fail. at this point I do not have any idea how to reproduce it. The latest run, the test passes:https://github.com/valkey-io/valkey/actions/runs/14895719363/job/41837740060

roshkhatri avatar May 09 '25 03:05 roshkhatri

The latest run, the test passes:https://github.com/valkey-io/valkey/actions/runs/14895719363/job/41837740060

We know the test is flakey, it doesn't fail consistently.

madolson avatar May 09 '25 17:05 madolson

Some more failure runs from 8.1.2 release #2173

https://github.com/valkey-io/valkey/actions/runs/15591253939/job/43910656748?pr=2173 https://github.com/valkey-io/valkey/actions/runs/15591253939/job/43910656742?pr=2173

hpatro avatar Jun 11 '25 18:06 hpatro