agones
agones copied to clipboard
Flaky: TestGameServerRestartBeforeReadyCrash
What happened:
The test TestGameServerRestartBeforeReadyCrash is quite flaky!
Short version:
--- FAIL: TestGameServerRestartBeforeReadyCrash (213.29s)
gameserver_test.go:353:
Error Trace: gameserver_test.go:353
Error: Received unexpected error:
timed out waiting for the condition
Test: TestGameServerRestartBeforeReadyCrash
What you expected to happen:
The test to pass consistently.
How to reproduce it (as minimally and precisely as possible):
Run builds.
Anything else we need to know?:
Long log:
time="2022-01-21 05:46:12.582" level=info msg="Waiting for us to have an address to send things to" test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:15.786" level=info msg="Waiting for states to match" awaitingState=Scheduled currentState=Scheduled gs=game-servertf8gs test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:15.786" level=info msg="GameServer created" gs=game-servertf8gs test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:15.786" level=info msg="Dialing UDP message to address" address="35.247.116.52:7735" test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:15.786" level=info msg="crashing, and waiting to see restart" test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:18.285" level=info msg="sending message" fields.msg=CRASH gs=game-servertf8gs state=Scheduled test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:21.030" level=info msg="sending message" fields.msg=CRASH gs=game-servertf8gs state=Scheduled test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:23.832" level=info msg="sending message" fields.msg=CRASH gs=game-servertf8gs state=Scheduled test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:26.831" level=info msg="sending message" fields.msg=CRASH gs=game-servertf8gs state=Scheduled test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:29.633" level=info msg="successfully crashed. Moving on!" test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:32.769" level=info msg="Waiting for states to match" awaitingState=Unhealthy currentState=Scheduled gs=game-servertf8gs test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:36.168" level=info msg="Waiting for states to match" awaitingState=Unhealthy currentState=Scheduled gs=game-servertf8gs test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:39.369" level=info msg="Waiting for states to match" awaitingState=Unhealthy currentState=Scheduled gs=game-servertf8gs test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:41.567" level=info msg="Waiting for states to match" awaitingState=Unhealthy currentState=Scheduled gs=game-servertf8gs test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:41.568" level=info msg="marking GameServer as ready" test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:44.433" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Scheduled test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:47.431" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:50.034" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:53.035" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:56.237" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:46:59.235" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:01.834" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:04.434" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:07.232" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:10.433" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:13.834" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:17.034" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:19.633" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:22.634" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:26.032" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:29.035" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:31.449" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:34.447" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:37.235" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:40.233" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:43.234" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:46.235" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:49.451" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:52.232" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:54.633" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:47:57.469" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:00.033" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:02.641" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:05.033" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:07.033" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:10.233" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:13.033" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:15.633" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:18.633" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:21.633" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:24.432" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:27.236" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:30.235" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:32.832" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:35.635" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:38.432" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:41.234" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:44.433" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:47.436" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:50.234" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:52.833" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:55.067" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:48:57.834" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:00.633" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:03.030" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:05.236" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:07.434" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:09.633" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:11.432" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:13.234" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:15.433" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:17.234" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:19.034" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:21.032" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:22.234" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:24.033" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:25.834" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:27.432" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:28.832" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:30.232" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:31.434" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:32.831" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:33.634" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:34.433" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:35.434" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:36.033" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:36.433" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:37.033" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:37.835" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:38.432" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:38.834" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:39.233" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:39.834" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:40.162" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:40.635" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:41.032" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:41.234" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:41.834" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:42.161" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:42.632" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:43.034" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:43.434" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:43.837" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:44.161" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:44.634" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:45.037" level=info msg="sending message" fields.msg=READY gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:45.037" level=info msg="crashing again, should be unhealthy" test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:45.260" level=info msg="checking final crash state" gs=game-servertf8gs state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
time="2022-01-21 05:49:45.260" level=info msg="Unhealthy! We are done!" test=TestGameServerRestartBeforeReadyCrash
--- FAIL: TestGameServerRestartBeforeReadyCrash (213.29s)
Test: TestGameServerRestartBeforeReadyCrash
Environment: N/A
- Agones version: dev
- Kubernetes version (use
kubectl version
): 1.21.0 - Cloud provider or hardware configuration: GKE
- Install method (yaml/helm): helm
- Troubleshooting guide log(s): See above
- Others: N/A
This has happened in a few builds lately, but I think the line numbers may have changed - or not
22: e2e-feature-gates https://github.com/googleforgames/agones/pull/2547#issuecomment-1104473512
--- FAIL: TestGameServerRestartBeforeReadyCrash (223.08s)
gameserver_test.go:353:
Error Trace: gameserver_test.go:353
Error: Received unexpected error:
timed out waiting for the condition
Test: TestGameServerRestartBeforeReadyCrash
Just tested this locally, looks like this is still an issue.
root@8308eef5623c:/go/src/agones.dev/agones/test/e2e# go test -race -run TestGameServerRestartBeforeReadyCrash -count 100
....
INFO[2022-08-26 19:45:35.856] sending message fields.msg=CRASH gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:36.057] sending message fields.msg=CRASH gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:36.258] sending message fields.msg=CRASH gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:36.461] sending message fields.msg=CRASH gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:36.666] sending message fields.msg=CRASH gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:36.862] sending message fields.msg=CRASH gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:37.060] sending message fields.msg=CRASH gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:37.260] sending message fields.msg=CRASH gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:37.452] sending message fields.msg=CRASH gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:37.658] successfully crashed. Moving on! test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:37.691] Waiting for states to match awaitingState=Unhealthy currentState=Scheduled gs=game-serverj55pk test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:38.728] Waiting for states to match awaitingState=Unhealthy currentState=Scheduled gs=game-serverj55pk test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:39.731] Waiting for states to match awaitingState=Unhealthy currentState=Scheduled gs=game-serverj55pk test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:40.732] Waiting for states to match awaitingState=Unhealthy currentState=Scheduled gs=game-serverj55pk test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:41.733] Waiting for states to match awaitingState=Unhealthy currentState=Scheduled gs=game-serverj55pk test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:42.729] Waiting for states to match awaitingState=Unhealthy currentState=Scheduled gs=game-serverj55pk test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:42.772] Waiting for states to match awaitingState=Unhealthy currentState=Scheduled gs=game-serverj55pk test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:42.772] marking GameServer as ready test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:42.846] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:43.129] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:43.333] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:43.536] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:43.740] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:43.930] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:44.138] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:44.330] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:44.531] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:44.730] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:44.930] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:45.133] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:45.334] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:45.539] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:45.720] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:45.941] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:46.136] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:46.342] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:46.536] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:46.733] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:46.940] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:47.120] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:47.325] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:47.528] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:47.737] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:47.937] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:48.130] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:48.336] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:48.531] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:48.744] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:48.931] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:49.135] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:49.333] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:49.531] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:49.722] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:49.937] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:50.148] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:50.344] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:50.539] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:50.737] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:50.935] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:51.138] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:51.337] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:51.533] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:51.728] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:51.933] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:52.133] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:52.342] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:52.533] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:52.731] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:52.936] sending message fields.msg=READY gs=game-serverj55pk state=Scheduled test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:53.126] sending message fields.msg=READY gs=game-serverj55pk state=RequestReady test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:53.330] sending message fields.msg=READY gs=game-serverj55pk state=RequestReady test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:53.533] sending message fields.msg=READY gs=game-serverj55pk state=RequestReady test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:53.720] ready! Moving On! test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:53.720] crashing again, should be unhealthy test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:53.806] checking final crash state gs=game-serverj55pk state=Ready test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:53.806] sending message fields.msg=CRASH gs=game-serverj55pk state=Ready test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:54.086] checking final crash state gs=game-serverj55pk state=Ready test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:54.087] sending message fields.msg=CRASH gs=game-serverj55pk state=Ready test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:54.292] checking final crash state gs=game-serverj55pk state=Ready test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:54.293] sending message fields.msg=CRASH gs=game-serverj55pk state=Ready test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:54.496] checking final crash state gs=game-serverj55pk state=Ready test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:54.497] sending message fields.msg=CRASH gs=game-serverj55pk state=Ready test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:54.694] checking final crash state gs=game-serverj55pk state=Ready test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:54.694] sending message fields.msg=CRASH gs=game-serverj55pk state=Ready test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:54.899] checking final crash state gs=game-serverj55pk state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:54.900] Unhealthy! We are done! test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:55.016] Waiting for us to have an address to send things to test=TestGameServerRestartBeforeReadyCrash
INFO[2022-08-26 19:45:55.082] Waiting for states to match awaitingState=Scheduled currentState=Creating gs=game-serverw57fc test=TestGameServerRestartBeforeReadyCrash
panic: test timed out after 10m0s
goroutine 4655 [running]:
testing.(*M).startAlarm.func1()
/usr/local/go/src/testing/testing.go:1788 +0xbb
created by time.goFunc
/usr/local/go/src/time/sleep.go:180 +0x4a
goroutine 1 [chan receive]:
testing.tRunner.func1()
/usr/local/go/src/testing/testing.go:1225 +0x635
testing.tRunner(0xc00017eb60, 0xc000157980)
/usr/local/go/src/testing/testing.go:1265 +0x269
testing.runTests(0xc000490680, {0x3299400, 0x46, 0x46}, {0x1204310, 0xc00052a1c0, 0x32abd00})
/usr/local/go/src/testing/testing.go:1596 +0x7cb
testing.(*M).Run(0xc000490680)
/usr/local/go/src/testing/testing.go:1504 +0x9d2
agones.dev/agones/test/e2e.TestMain(0x400)
/go/src/agones.dev/agones/test/e2e/main_test.go:94 +0x9e8
main.main()
_testmain.go:183 +0x265
goroutine 4 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0x0)
/go/src/agones.dev/agones/vendor/k8s.io/klog/v2/klog.go:1181 +0x8b
created by k8s.io/klog/v2.init.0
/go/src/agones.dev/agones/vendor/k8s.io/klog/v2/klog.go:420 +0x1c5
goroutine 30 [IO wait]:
internal/poll.runtime_pollWait(0x7f5b3133a7d8, 0x72)
/usr/local/go/src/runtime/netpoll.go:229 +0x89
internal/poll.(*pollDesc).wait(0xc00011e118, 0xc000012000, 0x0)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0xbd
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00011e100, {0xc000012000, 0x44ee, 0x44ee})
/usr/local/go/src/internal/poll/fd_unix.go:167 +0x419
net.(*netFD).Read(0xc00011e100, {0xc000012000, 0x44ee, 0x44ee})
/usr/local/go/src/net/fd_posix.go:56 +0x51
net.(*conn).Read(0xc000306010, {0xc000012000, 0x44ee, 0x44ee})
/usr/local/go/src/net/net.go:183 +0xb1
crypto/tls.(*atLeastReader).Read(0xc0001319b0, {0xc000012000, 0x44ee, 0x44ee})
/usr/local/go/src/crypto/tls/conn.go:777 +0x86
bytes.(*Buffer).ReadFrom(0xc0000ae278, {0x255b700, 0xc0001319b0})
/usr/local/go/src/bytes/buffer.go:204 +0x113
crypto/tls.(*Conn).readFromUntil(0xc0000ae000, {0x255e220, 0xc000306010}, 0x5)
/usr/local/go/src/crypto/tls/conn.go:799 +0x1df
crypto/tls.(*Conn).readRecordOrCCS(0xc0000ae000, 0x0)
/usr/local/go/src/crypto/tls/conn.go:606 +0x3fe
crypto/tls.(*Conn).readRecord(...)
/usr/local/go/src/crypto/tls/conn.go:574
crypto/tls.(*Conn).Read(0xc0000ae000, {0xc00028b000, 0x1000, 0x0})
/usr/local/go/src/crypto/tls/conn.go:1277 +0x29c
bufio.(*Reader).Read(0xc000700540, {0xc00027e3c0, 0x9, 0x9})
/usr/local/go/src/bufio/bufio.go:227 +0x4db
io.ReadAtLeast({0x255b560, 0xc000700540}, {0xc00027e3c0, 0x9, 0x9}, 0x9)
/usr/local/go/src/io/io.go:328 +0xde
io.ReadFull(...)
/usr/local/go/src/io/io.go:347
golang.org/x/net/http2.readFrameHeader({0xc00027e3c0, 0x9, 0x9}, {0x255b560, 0xc000700540})
/go/src/agones.dev/agones/vendor/golang.org/x/net/http2/frame.go:237 +0x96
golang.org/x/net/http2.(*Framer).ReadFrame(0xc00027e380)
/go/src/agones.dev/agones/vendor/golang.org/x/net/http2/frame.go:498 +0x108
golang.org/x/net/http2.(*clientConnReadLoop).run(0xc0005d3f78)
/go/src/agones.dev/agones/vendor/golang.org/x/net/http2/transport.go:2101 +0x1f3
golang.org/x/net/http2.(*ClientConn).readLoop(0xc0004b1080)
/go/src/agones.dev/agones/vendor/golang.org/x/net/http2/transport.go:1997 +0xb5
created by golang.org/x/net/http2.(*Transport).newClientConn
/go/src/agones.dev/agones/vendor/golang.org/x/net/http2/transport.go:725 +0x14cb
goroutine 4703 [select]:
k8s.io/apimachinery/pkg/util/wait.poller.func1.1()
/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:708 +0x2ed
created by k8s.io/apimachinery/pkg/util/wait.poller.func1
/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:691 +0x12c
goroutine 4701 [select]:
k8s.io/apimachinery/pkg/util/wait.WaitForWithContext({0x2589408, 0xc0000420c8}, 0xc000323260, 0x18)
/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:658 +0x189
k8s.io/apimachinery/pkg/util/wait.poll({0x2589408, 0xc0000420c8}, 0x1, 0x198f601, 0x4a1045)
/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:594 +0xe6
k8s.io/apimachinery/pkg/util/wait.PollImmediateWithContext({0x2589408, 0xc0000420c8}, 0xc0003f7540, 0x0, 0x0)
/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:526 +0x66
k8s.io/apimachinery/pkg/util/wait.PollImmediate(0xc0002edab0, 0x22a41e0, 0x4)
/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:512 +0x71
agones.dev/agones/test/e2e/framework.(*Framework).WaitForGameServerState(0xc000491f00, 0xc00017ed00, 0xc0000d5400, {0x22a968d, 0x9}, 0x0)
/go/src/agones.dev/agones/test/e2e/framework/framework.go:257 +0x2dc
agones.dev/agones/test/e2e.TestGameServerRestartBeforeReadyCrash(0xc00017ed00)
/go/src/agones.dev/agones/test/e2e/gameserver_test.go:278 +0x705
testing.tRunner(0xc00017ed00, 0x2366730)
/usr/local/go/src/testing/testing.go:1259 +0x230
created by testing.(*T).Run
/usr/local/go/src/testing/testing.go:1306 +0x727
exit status 2
FAIL agones.dev/agones/test/e2e 600.514s
I saw this fail recently as well. Grabbing as a good first issue.
Re: https://github.com/googleforgames/agones/issues/2445#issuecomment-1228927983, the problem here (and I ran into it as well), is that -count <X>
is still constrained by -timeout
, as evidenced by:
INFO[2022-08-26 19:45:55.082] Waiting for states to match awaitingState=Scheduled currentState=Creating gs=game-serverw57fc test=TestGameServerRestartBeforeReadyCrash
panic: test timed out after 10m0s
I'm trying this locally with -count 100 -timeout 0
now, but I saw a flake here yesterday, too: https://github.com/googleforgames/agones/pull/2782#issuecomment-1298755291 .
I feel like this one is a nightmare to replicate.
Breadcrumbs: In the case of https://github.com/googleforgames/agones/issues/2445#issue-1110631394 and https://github.com/googleforgames/agones/pull/2782#issuecomment-1298755291 both, we see this kind of odd pattern:
Step #23 - "e2e-stable": time="2022-11-01 15:59:22.597" level=info msg="checking final crash state" gs=game-serverxsdhc state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
Step #23 - "e2e-stable": time="2022-11-01 15:59:22.598" level=info msg="Unhealthy! We are done!" test=TestGameServerRestartBeforeReadyCrash
Step #23 - "e2e-stable": --- FAIL: TestGameServerRestartBeforeReadyCrash (205.04s)
Step #23 - "e2e-stable": gameserver_test.go:354:
Step #23 - "e2e-stable": Error Trace: gameserver_test.go:354
Step #23 - "e2e-stable": Error: Received unexpected error:
Step #23 - "e2e-stable": timed out waiting for the condition
Step #23 - "e2e-stable": Test: TestGameServerRestartBeforeReadyCrash
Step #23 - "e2e-stable": time="2022-11-01 15:59:23.025" level=info msg="Checking Fleet Ready replicas" expected=1 fleet=simple-fleet-1.08n4sb fleetStatus="{Replicas:2 ReadyReplicas:0 ReservedReplicas:0 AllocatedReplicas:2 Players:<nil>}" test="TestFleetRollingUpdate/Use_fleet_Patch_true_25%_cycle_true"
note that we're saying Unhealthy! We are done!
, but then failing on a timeout. I included the line after it to understand the timing, which is very tight. Ok, but the Unhealthy! We are done!
is from: https://github.com/googleforgames/agones/blob/dc3eb37f05e5c567e3d642ef4a77f0e59e993446/test/e2e/gameserver_test.go#L363
but the error trace is line 354 above it: https://github.com/googleforgames/agones/blob/dc3eb37f05e5c567e3d642ef4a77f0e59e993446/test/e2e/gameserver_test.go#L354
ETA: This is presumably because assert.NoError
only calls t.Errorf
- so we see a timeout and then continue on.
I think in both cases I can see this pattern (** for emphasis):
Step #23 - "e2e-stable": time="2022-11-01 15:56:08.199" level=info msg="successfully crashed. Moving on!" test=TestGameServerRestartBeforeReadyCrash
Step #23 - "e2e-stable": time="2022-11-01 15:56:10.391" level=info msg="Waiting for states to match" awaitingState=Unhealthy currentState=Scheduled gs=game-serverxsdhc test=TestGameServerRestartBeforeReadyCrash
Step #23 - "e2e-stable": time="2022-11-01 15:56:14.192" level=info msg="Waiting for states to match" awaitingState=Unhealthy currentState=Scheduled gs=game-serverxsdhc test=TestGameServerRestartBeforeReadyCrash
Step #23 - "e2e-stable": time="2022-11-01 15:56:16.592" level=info msg="Waiting for states to match" awaitingState=Unhealthy currentState=Scheduled gs=game-serverxsdhc test=TestGameServerRestartBeforeReadyCrash
Step #23 - "e2e-stable": time="2022-11-01 15:56:19.391" level=info msg="Waiting for states to match" awaitingState=Unhealthy currentState=Scheduled gs=game-serverxsdhc test=TestGameServerRestartBeforeReadyCrash
Step #23 - "e2e-stable": time="2022-11-01 15:56:19.391" level=info msg="marking GameServer as ready" test=TestGameServerRestartBeforeReadyCrash
Step #23 - "e2e-stable": time="2022-11-01 15:56:22.198" level=info msg="sending message" fields.msg=READY gs=game-serverxsdhc state=Scheduled test=TestGameServerRestartBeforeReadyCrash
Step #23 - "e2e-stable": time="2022-11-01 15:56:24.399" level=info msg="sending message" fields.msg=READY gs=game-serverxsdhc state=Scheduled test=TestGameServerRestartBeforeReadyCrash
** Step #23 - "e2e-stable": time="2022-11-01 15:56:27.397" level=info msg="sending message" fields.msg=READY gs=game-serverxsdhc state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
Step #23 - "e2e-stable": time="2022-11-01 15:56:29.998" level=info msg="sending message" fields.msg=READY gs=game-serverxsdhc state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
Step #23 - "e2e-stable": time="2022-11-01 15:56:33.199" level=info msg="sending message" fields.msg=READY gs=game-serverxsdhc state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
Step #23 - "e2e-stable": time="2022-11-01 15:56:36.398" level=info msg="sending message" fields.msg=READY gs=game-serverxsdhc state=Unhealthy test=TestGameServerRestartBeforeReadyCrash
In particular, it looks like during the flakes, the GameServer
is always transitioning Scheduled
-> Unhealthy
.
I have not been able to repro locally yet (hundreds of runs over the afternoon) - this may take some logs analysis.
I scoured the build logs using the hyper-advanced technique of:
gsutil -m cp gs://agones-build-logs/* .
egrep "Test:.*TestGameServerRestartBeforeReadyCrash" *.txt
Unfortunately, build retention is short enough that I only see the failure in https://github.com/googleforgames/agones/pull/2782#issuecomment-1298755291.
Going to follow up on the CI clusters and see if I can maybe poke at logs there.
Another dupe: https://github.com/googleforgames/agones/pull/2790#issuecomment-1301529764
So, I've been doing some logs analysis on https://github.com/googleforgames/agones/pull/2782#issuecomment-1298755291. What I see is that we crash the container twice, and due to backoff, there's a pretty big gap in health checks:
It's not clear this gap matters, but it's certainly curious. But then it does get to RequestReady
state:
.. and fails just as quickly?
FWIW, the pod update there seems to be .. possibly irrelevant, not sure:
Anyways, still looking, and now I have another dupe to look at tomorrow.
I have a theory as to what's happening here. I'll distill it down to a time-sequence first, then show the logs. I think what's happening is:
time | event |
---|---|
0 | container c14f starts |
1 | kubelet updates Pod status to show container c14f as running (restartCount=0 ) |
2 | container c14f crashes (forced by CRASH message from e2e) |
3 | kubelet restarts container, container 7c7c starts |
4 | e2e sends another CRASH |
5 | kubelet updates Pod status to shows container 7c7c as running (restartCount=1 ) |
6 | e2e recognizes crash has succeeded and stops sending CRASH , moves on to send READY |
7 | container 7c7c crashes (forced by CRASH message at t=4) |
8 | kubelet restarts container, container 2e5d starts |
9 | container 2e5d receives READY , moving GameServer to RequestReady |
10 | controller sees RequestReady and moves GameServer to Ready , with GameServerReadyContainerIDAnnotation as 7c7c . (Reminder, controller has not seen a Pod update yet) |
11 | kubelet updates Pod status to shows container 2e5d as running (restartCount=2 ) |
12 | controller sees 2e5d != 7c7c and assumes Pod died after calling Ready() . |
We see echoes of this in the logs for https://github.com/googleforgames/agones/pull/2790#issuecomment-1301529764 (I am eliding the first container, c14f
- the only importance of that container is that the e2e moved on from CRASH
to READY
):
- 2022-11-03T00:46:03.552890Z: 7c7c dies
SyncLoop (PLEG): event for pod" pod="1667436145/game-server5m66g" event=&{ID:cc19f829-800c-47b6-b93e-619e07cddd66 Type:ContainerDied Data:7c7ce1cfe5f4717e5fbf14e68806a59a8babde538834ffbb015dcbb1ecb02157}
- 2022-11-03T00:46:14.622672Z: 2e5d started
SyncLoop (PLEG): event for pod" pod="1667436145/game-server5m66g" event=&{ID:cc19f829-800c-47b6-b93e-619e07cddd66 Type:ContainerStarted Data:2e5db813bc74279ad454d1974637aa2f4c16f32ea77c942f300e6480db0bd7b4}
-
2022-11-03T00:46:19.171202006Z: We see a
Syncing RequestReady State
. Note that this is the last log in that message - there's also aSDK.Ready() complete
event recorded during the same second (46:19) that appears out of order in Logs Explorer due to lack of precision in the event log. -
2022-11-03T00:46:19.191195Z: just after
SDK.Ready
we see a Pod update that still shows7c7c
as running - this is why I am guessing the sequence around t=10 above:
{
"ready": true,
"lastState": {
"terminated": {
"exitCode": 1,
"reason": "Error",
"finishedAt": "2022-11-03T00:45:57Z",
"startedAt": "2022-11-03T00:45:55Z",
"containerID": "containerd://c14f0b62e3cff080fd9662f1478fd06eef363a878dd732c45dfabf92017e56bd"
}
},
"name": "game-server",
"image": "gcr.io/agones-images/simple-game-server:0.14",
"imageID": "gcr.io/agones-images/simple-game-server@sha256:564578aa131ae856bdb7aef3f389a34fc73e7e396b5838ae9bc65f80909b8b96",
"containerID": "containerd://7c7ce1cfe5f4717e5fbf14e68806a59a8babde538834ffbb015dcbb1ecb02157",
"state": {
"running": {
"startedAt": "2022-11-03T00:45:59Z"
}
},
"started": true,
"restartCount": 1
}
- 2022-11-03T00:46:20.772518856Z:
agones-controller
sees the container2e5d
for the first time and that7c7c
has failed:
{
"insertId": "7tw4phgwr0rveiqq",
"jsonPayload": {
"containerStatuses": [
{
"imageID": "us-docker.pkg.dev/agones-images/ci/agones-sdk@sha256:73413d1cf61febccf1305f22195309c01b8be402cbde7f9fb2762b0f7fea543c",
"image": "us-docker.pkg.dev/agones-images/ci/agones-sdk:1.28.0-66f21f3",
"name": "agones-gameserver-sidecar",
"started": true,
"lastState": {},
"ready": true,
"restartCount": 0,
"state": {
"running": {
"startedAt": "2022-11-03T00:45:55Z"
}
},
"containerID": "containerd://197488cfc52f5cc6683c0c96d7c92bdc517675627b3e465c64ad02a5fb602216"
},
{
"imageID": "gcr.io/agones-images/simple-game-server@sha256:564578aa131ae856bdb7aef3f389a34fc73e7e396b5838ae9bc65f80909b8b96",
"containerID": "containerd://2e5db813bc74279ad454d1974637aa2f4c16f32ea77c942f300e6480db0bd7b4",
"state": {
"running": {
"startedAt": "2022-11-03T00:46:14Z"
}
},
"name": "game-server",
"restartCount": 2,
"image": "gcr.io/agones-images/simple-game-server:0.14",
"ready": true,
"lastState": {
"terminated": {
"containerID": "containerd://7c7ce1cfe5f4717e5fbf14e68806a59a8babde538834ffbb015dcbb1ecb02157",
"reason": "Error",
"finishedAt": "2022-11-03T00:46:00Z",
"exitCode": 1,
"startedAt": "2022-11-03T00:45:59Z"
}
},
"started": true
}
],
"gs": "game-server5m66g",
"message": "Container Failed",
"container": "game-server",
"source": "*gameservers.HealthController"
},
"resource": {
"type": "k8s_container",
"labels": {
"location": "us-west1-c",
"project_id": "agones-images",
"namespace_name": "agones-system",
"container_name": "agones-controller",
"pod_name": "agones-controller-6b94677bc6-rnbwq",
"cluster_name": "e2e-test-cluster"
}
},
"timestamp": "2022-11-03T00:46:20.772518856Z",
"severity": "DEBUG",
"labels": {
"k8s-pod/agones_dev/role": "controller",
"k8s-pod/heritage": "Helm",
"k8s-pod/release": "agones",
"k8s-pod/app": "agones",
"k8s-pod/pod-template-hash": "6b94677bc6",
"compute.googleapis.com/resource_name": "gke-e2e-test-cluster-agones-system-79cf09e3-bgs7"
},
"logName": "projects/agones-images/logs/stderr",
"receiveTimestamp": "2022-11-03T00:46:25.762176239Z"
}
This is some super interesting analysis!
The thing I'm not 100% sure on with the analysis, is that this is the order of updates on the Pod first, then the GameServer for the GameServerReadyContainerIDAnnotation
https://github.com/googleforgames/agones/blob/3c38876382d505244af68489211d8916dce07596/pkg/gameservers/controller.go#L854-L864
So if the Pod is currently out of sync, the Update
call should fail, since there is a newer generation in K8s.
Also, in the health controller, if the annotations are out of sync with each other (which I don't think should ever happen? Is that what you are definitely seeing here?) the health controller will return an error, to kick it into a retry, rather than move it to Unhealthy
. 🤔
https://github.com/googleforgames/agones/blob/f9c333d25417fc5523583bb294ce520e7a33830b/pkg/gameservers/health.go#L255-L257
| So if the Pod is currently out of sync, the Update call should fail, since there is a newer generation in K8s.
Hmm. I think that as long as step 10 and 11 occur in this order (per earlier comment):
time | event |
---|---|
7 | container 7c7c crashes (forced by CRASH message at t=4) |
8 | kubelet restarts container, container 2e5d starts |
9 | container 2e5d receives READY , moving GameServer to RequestReady |
10 | controller sees RequestReady and moves GameServer to Ready , with GameServerReadyContainerIDAnnotation as 7c7c . (Reminder, controller has not seen a Pod update yet) |
11 | kubelet updates Pod status to shows container 2e5d as running (restartCount=2 ) |
then kubelet
basically loses the race to update the pod (it's update in step 10, I didn't make that explicit).
| Also, in the health controller, if the annotations are out of sync with each other (which I don't think should ever happen? Is that what you are definitely seeing here?) the health controller will return an error, to kick it into a retry
I haven't seen the error from line 256, so I assume not.
Wait! Yes, the second screenshot in https://github.com/googleforgames/agones/issues/2445#issuecomment-1301540694 had exactly that error message: pod and gameserver game-serverxsdhc data are out of sync, retrying
~Oh, except that error is also exactly what you expect after the container restarts, that's right. (Trying to page this all back in.)~ (ignore me, I think I'm confused.)
The health.go:256 error message is present, but I think that's expected in between the update to the Pod
and the update to the GameServer
.
This going to be a bit spammy with pictures, I hope they help. In order, I see:
-
game-server
updates toRequestReady
(I surmise this bycallerSuppliedUserAgent
beingsdk-server/v0.0.0 [...]
, but I didn't include that in the screen shot) (this is step 9 in the sequence above): -
I think this is the updates from line 855 and line 861. Note that in the middle we see some red herring updates from other controllers, including the
out of sync, retrying
. This is expected between updates. This is step 10 in the sequence above: -
Right after, we signal
SDK.Ready()
: -
But then right after, I believe this is
kubelet
updating the pod (SyncLoop UPDATE
is basically saying "I found an update during my sync loop"), after which we noticeContainer Failed
. This is step 11 in the sequence above:
I realize I am inferring a lot from the SyncLoop
, but unfortunately we don't keep audit logs on Pod.Status
updates, so it's hard to draw a strong conclusion. We know that line 855/861 result in synchronized (but bad) state of GameServerReadyContainerIDAnnotation
because we see the skipUnhealthyGameContainer: Container crashed after Ready, returning false
message that happens after the HealthController
verifies the pod/GS match: https://github.com/googleforgames/agones/blob/d5cf2b0c8ed5e085cd532ae476b0dbdd15b35a29/pkg/gameservers/health.go#L279
So I'm pretty confident in this analysis. I think if kubelet
loses the race, we can't actually use the container ID the way it's being used here.
I question the value of this feature and would like to return to an earlier question I asked on chat: Is this solving a real problem? Do customers often have workloads that flap before Ready()
, i.e. a "settling period"? Or is this just kind of a nice to have feature? If it's the latter, I'd really recommend we just move this to "any termination => Unhealthy
" and eliminate this logic. It's really complicated to get right.
If it's a really necessary feature for some usecase, I think there's a better way to handle this, but it's invasive: The game server container can generate a random cookie/nonce and communicate that with Ready()
and on healthchecks. On healthcheck, the SDK checks the cookie and fail healthchecks immediately if there's any mismatch. Basically, take any of the container analysis out of it and just make this strictly a conversation between the game server and the sidecar - that should be a lot more bulletproof at detecting that the game server that called Ready()
is also the one that's currently running.
ETA: Another option to keep the feature might be to implement some delay to RequestReady
. We could implement a speed-bump (even 1s would probably close it) at https://github.com/googleforgames/agones/blob/d5cf2b0c8ed5e085cd532ae476b0dbdd15b35a29/pkg/sdkserver/sdkserver.go#L443
Also, this race only really exists because the game server in this case was able to call Ready()
so quickly. Another way to close the race from the unit test perspective is to add a speed bump to simple-game-server
(it could even be done for only this test). With a minor delay, I expect kubelet
will always win the race.
If we fix it this way, we are effectively agreeing that in a narrow set of circumstances, the GameServer
may land in Unhealthy
unnecessarily - i.e. we accept that the race exists but in 99.9% of cases it just doesn't matter. That's an ok compromise, I suspect.