Watch dropping an event when compacting on delete
Bug report criteria
- [ ] This bug report is not security related, security issues should be disclosed privately via etcd maintainers.
- [ ] This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
- [ ] You have read the etcd bug reporting guidelines.
- [ ] Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.
What happened?
When testing https://github.com/etcd-io/etcd/pull/17833 I encountered watch breaking reliable guarantee.
Report included 1716928418720891814.zip
What did you expect to happen?
Watch should not break guarantee
How can we reproduce it (as minimally and precisely as possible)?
TODO
Anything else we need to know?
No response
Etcd version (please run commands below)
v3.5
Etcd configuration (command line flags or environment variables)
paste your configuration here
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
$ etcdctl member list -w table
# paste output here
$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here
Relevant log output
logger.go:146: 2024-05-28T22:33:31.699+0200 INFO starting proxy... {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2", "from": "https://localhost:20013", "to": "https://localhost:20011"}
logger.go:146: 2024-05-28T22:33:31.699+0200 INFO starting proxy... {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0", "from": "https://localhost:20003", "to": "https://localhost:20001"}
logger.go:146: 2024-05-28T22:33:31.699+0200 INFO starting proxy... {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1", "from": "https://localhost:20008", "to": "https://localhost:20006"}
logger.go:146: 2024-05-28T22:33:31.699+0200 INFO starting server... {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2"}
logger.go:146: 2024-05-28T22:33:31.699+0200 INFO starting server... {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0"}
logger.go:146: 2024-05-28T22:33:31.699+0200 INFO spawning process {"args": ["/tmp/etcd-release-3.5-failpoints/bin/etcd", "--name=TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2", "--listen-client-urls=http://localhost:20010", "--advertise-client-urls=http://localhost:20010", "--listen-peer-urls=https://localhost:20011", "--initial-advertise-peer-urls=https://localhost:20013", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdHighTrafficClusterOfSize34219590533/003", "--snapshot-count=1000", "--peer-cert-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/server.crt", "--peer-key-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/server.key.insecure", "--peer-trusted-ca-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/ca.crt", "--initial-cluster-token=new", "--experimental-watch-progress-notify-interval=100ms", "--snapshot-count=1000", "--experimental-snapshot-catchup-entries=100", "--experimental-compaction-batch-limit=1000", "--initial-cluster=TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0=https://localhost:20003,TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1=https://localhost:20008,TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2=https://localhost:20013", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory1597671579/001", "name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:32381", "ETCD_VERIFY=all", "PATH=/home/serathius/pkg/mod/golang.org/[email protected]/bin:/home/serathius/.gvm/gos/go1.22/bin:/home/serathius/bin:/home/serathius//bin/:/home/serathius/.gvm/pkgsets/go1.22/global/bin:/home/serathius/.gvm/gos/go1.22/bin:/home/serathius/.gvm/pkgsets/go1.22/global/overlay/bin:/home/serathius/.gvm/bin:/home/serathius/.gvm/bin:/home/serathius/.cabal/bin/:/home/serathius/.local/bin:/home/serathius/.local/share/umake/bin:/home/serathius/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
logger.go:146: 2024-05-28T22:33:31.699+0200 INFO spawning process {"args": ["/tmp/etcd-release-3.5-failpoints/bin/etcd", "--name=TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=https://localhost:20001", "--initial-advertise-peer-urls=https://localhost:20003", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdHighTrafficClusterOfSize34219590533/001", "--snapshot-count=1000", "--peer-cert-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/server.crt", "--peer-key-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/server.key.insecure", "--peer-trusted-ca-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/ca.crt", "--experimental-watch-progress-notify-interval=100ms", "--initial-cluster-token=new", "--experimental-snapshot-catchup-entries=100", "--experimental-compaction-batch-limit=1000", "--snapshot-count=1000", "--initial-cluster=TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0=https://localhost:20003,TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1=https://localhost:20008,TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2=https://localhost:20013", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory1597671579/001", "name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "PATH=/home/serathius/pkg/mod/golang.org/[email protected]/bin:/home/serathius/.gvm/gos/go1.22/bin:/home/serathius/bin:/home/serathius//bin/:/home/serathius/.gvm/pkgsets/go1.22/global/bin:/home/serathius/.gvm/gos/go1.22/bin:/home/serathius/.gvm/pkgsets/go1.22/global/overlay/bin:/home/serathius/.gvm/bin:/home/serathius/.gvm/bin:/home/serathius/.cabal/bin/:/home/serathius/.local/bin:/home/serathius/.local/share/umake/bin:/home/serathius/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
logger.go:146: 2024-05-28T22:33:31.699+0200 INFO starting server... {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1"}
logger.go:146: 2024-05-28T22:33:31.699+0200 INFO spawning process {"args": ["/tmp/etcd-release-3.5-failpoints/bin/etcd", "--name=TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1", "--listen-client-urls=http://localhost:20005", "--advertise-client-urls=http://localhost:20005", "--listen-peer-urls=https://localhost:20006", "--initial-advertise-peer-urls=https://localhost:20008", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdHighTrafficClusterOfSize34219590533/002", "--snapshot-count=1000", "--peer-cert-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/server.crt", "--peer-key-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/server.key.insecure", "--peer-trusted-ca-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/ca.crt", "--experimental-snapshot-catchup-entries=100", "--initial-cluster-token=new", "--experimental-watch-progress-notify-interval=100ms", "--snapshot-count=1000", "--experimental-compaction-batch-limit=1000", "--initial-cluster=TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0=https://localhost:20003,TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1=https://localhost:20008,TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2=https://localhost:20013", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory1597671579/001", "name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:22381", "ETCD_VERIFY=all", "PATH=/home/serathius/pkg/mod/golang.org/[email protected]/bin:/home/serathius/.gvm/gos/go1.22/bin:/home/serathius/bin:/home/serathius//bin/:/home/serathius/.gvm/pkgsets/go1.22/global/bin:/home/serathius/.gvm/gos/go1.22/bin:/home/serathius/.gvm/pkgsets/go1.22/global/overlay/bin:/home/serathius/.gvm/bin:/home/serathius/.gvm/bin:/home/serathius/.cabal/bin/:/home/serathius/.local/bin:/home/serathius/.local/share/umake/bin:/home/serathius/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
logger.go:146: 2024-05-28T22:33:33.927+0200 INFO started server. {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1", "pid": 191460}
logger.go:146: 2024-05-28T22:33:33.927+0200 INFO started server. {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0", "pid": 191459}
logger.go:146: 2024-05-28T22:33:33.927+0200 INFO started server. {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2", "pid": 191458}
logger.go:146: 2024-05-28T22:33:33.931+0200 INFO Start traffic
logger.go:146: 2024-05-28T22:33:34.969+0200 INFO Triggering failpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
cluster.go:1066: members agree on a leader, members:map[12416079282240904009:1 13770228943176794332:2 16914881897345358027:0] , leader:13770228943176794332
logger.go:146: 2024-05-28T22:33:35.012+0200 INFO Setting up gofailpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
logger.go:146: 2024-05-28T22:33:35.012+0200 INFO Waiting for member to exit {"member": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2"}
logger.go:146: 2024-05-28T22:33:35.019+0200 INFO server exited {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2", "code": 2}
logger.go:146: 2024-05-28T22:33:35.019+0200 INFO Member exited as expected {"member": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2"}
logger.go:146: 2024-05-28T22:33:35.019+0200 INFO starting server... {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2"}
logger.go:146: 2024-05-28T22:33:35.019+0200 INFO spawning process {"args": ["/tmp/etcd-release-3.5-failpoints/bin/etcd", "--name=TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2", "--listen-client-urls=http://localhost:20010", "--advertise-client-urls=http://localhost:20010", "--listen-peer-urls=https://localhost:20011", "--initial-advertise-peer-urls=https://localhost:20013", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdHighTrafficClusterOfSize34219590533/003", "--snapshot-count=1000", "--peer-cert-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/server.crt", "--peer-key-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/server.key.insecure", "--peer-trusted-ca-file", "/home/serathius/src/go.etcd.io/etcd/tests/fixtures/ca.crt", "--initial-cluster-token=new", "--experimental-watch-progress-notify-interval=100ms", "--snapshot-count=1000", "--experimental-snapshot-catchup-entries=100", "--experimental-compaction-batch-limit=1000", "--initial-cluster=TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0=https://localhost:20003,TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1=https://localhost:20008,TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2=https://localhost:20013", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory1597671579/001", "name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:32381", "ETCD_VERIFY=all", "PATH=/home/serathius/pkg/mod/golang.org/[email protected]/bin:/home/serathius/.gvm/gos/go1.22/bin:/home/serathius/bin:/home/serathius//bin/:/home/serathius/.gvm/pkgsets/go1.22/global/bin:/home/serathius/.gvm/gos/go1.22/bin:/home/serathius/.gvm/pkgsets/go1.22/global/overlay/bin:/home/serathius/.gvm/bin:/home/serathius/.gvm/bin:/home/serathius/.cabal/bin/:/home/serathius/.local/bin:/home/serathius/.local/share/umake/bin:/home/serathius/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
logger.go:146: 2024-05-28T22:33:36.129+0200 INFO started server. {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2", "pid": 191560}
logger.go:146: 2024-05-28T22:33:36.130+0200 INFO Finished triggering failpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
logger.go:146: 2024-05-28T22:33:37.138+0200 INFO Finished traffic
logger.go:146: 2024-05-28T22:33:38.141+0200 INFO Reporting complete traffic {"successes": 2340, "failures": 74, "successRate": 0.9693454846727423, "period": "3.207153256s", "qps": 729.6190151257306}
logger.go:146: 2024-05-28T22:33:38.141+0200 INFO Reporting traffic before failure injection {"successes": 1250, "failures": 1, "successRate": 0.9992006394884093, "period": "1.038661462s", "qps": 1203.4720125198983}
logger.go:146: 2024-05-28T22:33:38.141+0200 INFO Reporting traffic during failure injection {"successes": 64, "failures": 41, "successRate": 0.6095238095238096, "period": "1.161063679s", "qps": 55.12186898751485}
logger.go:146: 2024-05-28T22:33:38.141+0200 INFO Reporting traffic after failure injection {"successes": 1026, "failures": 32, "successRate": 0.9697542533081286, "period": "1.007428115s", "qps": 1018.4349480856011}
logger.go:146: 2024-05-28T22:33:38.141+0200 INFO Finished simulating traffic {"max-revision": 1107}
logger.go:146: 2024-05-28T22:33:38.202+0200 INFO Validating linearizable operations {"timeout": "5m0s"}
logger.go:146: 2024-05-28T22:33:38.697+0200 INFO Linearization success {"duration": "494.602505ms"}
logger.go:146: 2024-05-28T22:33:38.698+0200 INFO Validating watch
logger.go:146: 2024-05-28T22:33:38.715+0200 ERROR Broke watch guarantee {"guarantee": "reliable", "client": 4, "diff": " []model.PersistedEvent{\n \t... // 725 identical elements\n \t{Event: {Type: \"put-operation\", Key: \"key4\", Value: {Value: \"632\"}}, Revision: 627},\n \t{Event: {Type: \"put-operation\", Key: \"key7\", Value: {Value: \"641\"}}, Revision: 628},\n- \t{Event: model.Event{Type: \"delete-operation\", Key: \"key0\"}, Revision: 628},\n \t{Event: {Type: \"put-operation\", Key: \"key5\", Value: {Value: \"642\"}}, Revision: 628},\n \t{Event: {Type: \"put-operation\", Key: \"key0\", Value: {Value: \"643\"}}, Revision: 629, ...},\n \t... // 555 identical elements\n }\n"}
validate.go:49: Failed validating watch history, err: broke Reliable - a sequence of events will never drop any subsequence of events; if there are events ordered in time as a < b < c, then if the watch receives events a and c, it is guaranteed to receive b
logger.go:146: 2024-05-28T22:33:38.715+0200 INFO Validating serializable operations
logger.go:146: 2024-05-28T22:33:38.721+0200 INFO Saving robustness test report {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814"}
logger.go:146: 2024-05-28T22:33:38.721+0200 INFO Saving member data dir {"member": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0", "path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/server-TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0"}
logger.go:146: 2024-05-28T22:33:38.721+0200 INFO Saving member data dir {"member": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1", "path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/server-TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1"}
logger.go:146: 2024-05-28T22:33:38.721+0200 INFO Saving member data dir {"member": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2", "path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/server-TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2"}
logger.go:146: 2024-05-28T22:33:38.721+0200 INFO Saving watch operations {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-1/watch.json"}
logger.go:146: 2024-05-28T22:33:38.721+0200 INFO no KV operations for client, skip persisting {"client-id": 1}
logger.go:146: 2024-05-28T22:33:38.721+0200 INFO no watch operations for client, skip persisting {"client-id": 2}
logger.go:146: 2024-05-28T22:33:38.721+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-2/operations.json"}
logger.go:146: 2024-05-28T22:33:38.721+0200 INFO Saving watch operations {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-3/watch.json"}
logger.go:146: 2024-05-28T22:33:38.722+0200 INFO no KV operations for client, skip persisting {"client-id": 3}
logger.go:146: 2024-05-28T22:33:38.722+0200 INFO Saving watch operations {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-4/watch.json"}
logger.go:146: 2024-05-28T22:33:38.722+0200 INFO no KV operations for client, skip persisting {"client-id": 4}
logger.go:146: 2024-05-28T22:33:38.722+0200 INFO no watch operations for client, skip persisting {"client-id": 5}
logger.go:146: 2024-05-28T22:33:38.722+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-5/operations.json"}
logger.go:146: 2024-05-28T22:33:38.723+0200 INFO no watch operations for client, skip persisting {"client-id": 6}
logger.go:146: 2024-05-28T22:33:38.723+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-6/operations.json"}
logger.go:146: 2024-05-28T22:33:38.723+0200 INFO no watch operations for client, skip persisting {"client-id": 7}
logger.go:146: 2024-05-28T22:33:38.723+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-7/operations.json"}
logger.go:146: 2024-05-28T22:33:38.724+0200 INFO no watch operations for client, skip persisting {"client-id": 8}
logger.go:146: 2024-05-28T22:33:38.724+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-8/operations.json"}
logger.go:146: 2024-05-28T22:33:38.724+0200 INFO no watch operations for client, skip persisting {"client-id": 9}
logger.go:146: 2024-05-28T22:33:38.725+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-9/operations.json"}
logger.go:146: 2024-05-28T22:33:38.725+0200 INFO no watch operations for client, skip persisting {"client-id": 10}
logger.go:146: 2024-05-28T22:33:38.725+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-10/operations.json"}
logger.go:146: 2024-05-28T22:33:38.726+0200 INFO no watch operations for client, skip persisting {"client-id": 11}
logger.go:146: 2024-05-28T22:33:38.726+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-11/operations.json"}
logger.go:146: 2024-05-28T22:33:38.726+0200 INFO no watch operations for client, skip persisting {"client-id": 12}
logger.go:146: 2024-05-28T22:33:38.726+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-12/operations.json"}
logger.go:146: 2024-05-28T22:33:38.727+0200 INFO no watch operations for client, skip persisting {"client-id": 13}
logger.go:146: 2024-05-28T22:33:38.727+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-13/operations.json"}
logger.go:146: 2024-05-28T22:33:38.727+0200 INFO no watch operations for client, skip persisting {"client-id": 14}
logger.go:146: 2024-05-28T22:33:38.727+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-14/operations.json"}
logger.go:146: 2024-05-28T22:33:38.728+0200 INFO no watch operations for client, skip persisting {"client-id": 15}
logger.go:146: 2024-05-28T22:33:38.728+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-15/operations.json"}
logger.go:146: 2024-05-28T22:33:38.728+0200 INFO no watch operations for client, skip persisting {"client-id": 16}
logger.go:146: 2024-05-28T22:33:38.728+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/client-16/operations.json"}
logger.go:146: 2024-05-28T22:33:38.729+0200 INFO Saving visualization {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize3/1716928418720891814/history.html"}
logger.go:146: 2024-05-28T22:33:38.772+0200 INFO killing server... {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0"}
logger.go:146: 2024-05-28T22:33:38.772+0200 INFO killing server... {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1"}
logger.go:146: 2024-05-28T22:33:38.772+0200 INFO killing server... {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2"}
logger.go:146: 2024-05-28T22:33:38.772+0200 INFO stopping server... {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2"}
logger.go:146: 2024-05-28T22:33:38.772+0200 INFO stopping server... {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0"}
logger.go:146: 2024-05-28T22:33:38.772+0200 INFO stopping server... {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1"}
logger.go:146: 2024-05-28T22:33:38.777+0200 INFO stopped server. {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0"}
logger.go:146: 2024-05-28T22:33:38.777+0200 INFO stopping proxy... {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-0"}
logger.go:146: 2024-05-28T22:33:38.777+0200 INFO stopped server. {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1"}
logger.go:146: 2024-05-28T22:33:38.777+0200 INFO stopping proxy... {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-1"}
logger.go:146: 2024-05-28T22:33:38.777+0200 INFO stopped server. {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2"}
logger.go:146: 2024-05-28T22:33:38.777+0200 INFO stopping proxy... {"name": "TestRobustnessExploratoryEtcdHighTrafficClusterOfSize3-test-2"}
cc @MadhavJivrajani @siyuanfoundation @fuweid
Can someone take a look and double check my findings?
/assign
I rerun the report on latest version of https://github.com/etcd-io/etcd/pull/17833 and it found that linearization was broken. Looks like a false positive due to some bug.
Ok, ok I encountered another case.
Failpoint used: raftBeforeLeaderSend=panic()
Scenario: TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1
Etcd version:
./bin/etcd --version
etcd Version: 3.6.0-alpha.0
Git SHA: 6f916d428
Go Version: go1.22.4
Go OS/Arch: linux/amd64
Report: 1717749910517706368.tar.gz
Found when adding compaction to operations https://github.com/serathius/etcd/commit/5959110f4a44ffcfc18ebbcd465ecae06c04b7a5
Logs showing broken watch guarantee due to missing log:
=== RUN TestDataReports/1717749910517706368
logger.go:146: 2024-06-07T11:31:24.546+0200 INFO Validating linearizable operations {"timeout": "5m0s"}
logger.go:146: 2024-06-07T11:31:24.950+0200 INFO Linearization success {"duration": "404.080944ms"}
logger.go:146: 2024-06-07T11:31:24.951+0200 INFO Validating watch
logger.go:146: 2024-06-07T11:31:24.960+0200 ERROR Broke watch guarantee {"guarantee": "reliable", "client": 2}
[]model.PersistedEvent{
... // 154 identical elements
{Event: {Type: "put-operation", Key: "key3", Value: {Hash: 3071434148}}, Revision: 139},
{Event: {Type: "put-operation", Key: "key5", Value: {Hash: 2150768673}}, Revision: 140},
- {Event: model.Event{Type: "delete-operation", Key: "key6"}, Revision: 141},
{Event: {Type: "put-operation", Key: "key8", Value: {Value: "121"}}, Revision: 141, ...},
{Event: {Type: "put-operation", Key: "key9", Value: {Value: "122"}}, Revision: 141},
... // 134 identical elements
}
validate.go:49: Failed validating watch history, err: broke Reliable - a sequence of events will never drop any subsequence of events; if there are events ordered in time as a < b < c, then if the watch receives events a and c, it is guaranteed to receive b
logger.go:146: 2024-06-07T11:31:24.960+0200 INFO Validating serializable operations
logger.go:146: 2024-06-07T11:31:24.967+0200 INFO Saving visualization {"path": "./src/go.etcd.io/etcd/tests/robustness/testdata/1717749910517706368/history.html"}
Screenshot showing a transaction for revision 141 and fact that key "key6" was present so delete really happened.
Watch response showing missing delete event
Complete log:
=== RUN TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1
(./etcd_--version) (1897882): etcd Version: 3.6.0-alpha.0
(./etcd_--version) (1897882): Git SHA: 6f916d428
(./etcd_--version) (1897882): Go Version: go1.22.4
(./etcd_--version) (1897882): Go OS/Arch: linux/amd64
(./etcd_--version) (1897894): etcd Version: 3.6.0-alpha.0
(./etcd_--version) (1897894): Git SHA: 6f916d428
(./etcd_--version) (1897894): Go Version: go1.22.4
(./etcd_--version) (1897894): Go OS/Arch: linux/amd64
logger.go:146: 2024-06-07T10:45:06.314+0200 INFO starting server... {"name": "member-0"}
logger.go:146: 2024-06-07T10:45:06.314+0200 INFO spawning process {"args": ["./etcd", "--name=member-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001", "--snapshot-count=1000", "--experimental-compaction-batch-limit=100", "--initial-cluster-token=new", "--election-timeout=521", "--experimental-snapshot-catchup-entries=100", "--experimental-watch-progress-notify-interval=100ms", "--snapshot-count=1000", "--heartbeat-interval=101", "--initial-cluster=member-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory2482998486/001", "name": "member-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/[email protected]/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.3237+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_VERIFY=all"}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.323787+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_UNSUPPORTED_ARCH=amd64"}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.32395+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.323991+0200","caller":"etcdmain/etcd.go:63","msg":"Running: ","args":["./etcd","--name=member-0","--listen-client-urls=http://localhost:20000","--advertise-client-urls=http://localhost:20000","--listen-peer-urls=http://localhost:20001","--initial-advertise-peer-urls=http://localhost:20001","--initial-cluster-token=new","--data-dir","/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","--snapshot-count=1000","--experimental-compaction-batch-limit=100","--initial-cluster-token=new","--election-timeout=521","--experimental-snapshot-catchup-entries=100","--experimental-watch-progress-notify-interval=100ms","--snapshot-count=1000","--heartbeat-interval=101","--initial-cluster=member-0=http://localhost:20001","--initial-cluster-state=new"]}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.3244+0200","caller":"etcdmain/etcd.go:123","msg":"Initialize and start etcd server","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","dir-type":"empty"}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.324433+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.324445+0200","caller":"embed/etcd.go:125","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:20001"]}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.324785+0200","caller":"embed/etcd.go:133","msg":"configuring client listeners","listen-client-urls":["http://localhost:20000"]}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.324963+0200","caller":"embed/etcd.go:314","msg":"starting an etcd server","etcd-version":"3.6.0-alpha.0","git-sha":"6f916d428","go-version":"go1.22.4","go-os":"linux","go-arch":"amd64","max-cpu-set":24,"max-cpu-available":24,"member-initialized":false,"name":"member-0","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","wal-dir":"","wal-dir-dedicated":"","member-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member","force-new-cluster":false,"heartbeat-interval":"101ms","election-timeout":"505ms","initial-election-tick-advance":true,"snapshot-count":1000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":100,"initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[],"experimental-set-member-localaddr":false,"experimental-local-address":"","cors":["*"],"host-whitelist":["*"],"initial-cluster":"member-0=http://localhost:20001","initial-cluster-state":"new","initial-cluster-token":"new","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.325529+0200","logger":"bbolt","caller":"backend/backend.go:197","msg":"Opening db file (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db) with mode 2d72772d2d2d2d2d2d2d and with options: {Timeout: 0s, NoGrowSync: false, NoFreelistSync: true, PreLoadFreelist: false, FreelistType: hashmap, ReadOnly: false, MmapFlags: 8000, InitialMmapSize: 10737418240, PageSize: 0, NoSync: false, OpenFile: 0x0, Mlock: false, Logger: 0xc0003943b0}"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.332264+0200","logger":"bbolt","caller":"[email protected]/db.go:321","msg":"Opening bbolt db (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db) successfully"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.3323+0200","caller":"storage/backend.go:80","msg":"opened backend db","path":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db","took":"6.84944ms"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.332419+0200","caller":"etcdserver/bootstrap.go:93","msg":"bootstrapping cluster"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.332446+0200","caller":"etcdserver/bootstrap.go:100","msg":"bootstrapping storage"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345644+0200","caller":"etcdserver/bootstrap.go:108","msg":"bootstrapping raft"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345682+0200","caller":"etcdserver/bootstrap.go:493","msg":"starting local member","local-member-id":"ca50e9357181d758","cluster-id":"34f27e83b3bc2ff"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345701+0200","caller":"etcdserver/server.go:306","msg":"bootstrap successfully"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345758+0200","logger":"raft","caller":"[email protected]/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=()"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.34579+0200","logger":"raft","caller":"[email protected]/raft.go:870","msg":"ca50e9357181d758 became follower at term 0"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345798+0200","logger":"raft","caller":"[email protected]/raft.go:483","msg":"newRaft ca50e9357181d758 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345809+0200","logger":"raft","caller":"[email protected]/raft.go:870","msg":"ca50e9357181d758 became follower at term 1"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.34584+0200","logger":"raft","caller":"[email protected]/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=(14578408409545168728)"}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.360383+0200","caller":"auth/store.go:1134","msg":"simple token is not cryptographically signed"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.364937+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":1}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372639+0200","caller":"storage/quota.go:93","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372684+0200","caller":"etcdserver/server.go:585","msg":"starting etcd server","local-member-id":"ca50e9357181d758","local-server-version":"3.6.0-alpha.0","cluster-version":"to_be_decided"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372729+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap","suffix":"snap.db","max":5,"interval":"30s"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372781+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap","suffix":"snap","max":5,"interval":"30s"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372793+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/wal","suffix":"wal","max":5,"interval":"30s"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372768+0200","caller":"etcdserver/server.go:470","msg":"started as single-node; fast-forwarding election ticks","local-member-id":"ca50e9357181d758","forward-ticks":4,"forward-duration":"404ms","election-ticks":5,"election-timeout":"505ms"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372839+0200","caller":"embed/etcd.go:610","msg":"serving peer traffic","address":"127.0.0.1:20001"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372854+0200","caller":"embed/etcd.go:582","msg":"cmux::serve","address":"127.0.0.1:20001"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.37288+0200","caller":"embed/etcd.go:283","msg":"now serving peer/client/metrics","local-member-id":"ca50e9357181d758","initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[]}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372974+0200","logger":"raft","caller":"[email protected]/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=(14578408409545168728)"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.3731+0200","caller":"membership/cluster.go:411","msg":"added member","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","added-peer-id":"ca50e9357181d758","added-peer-peer-urls":["http://localhost:20001"],"added-peer-is-learner":false}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548218+0200","logger":"raft","caller":"[email protected]/raft.go:956","msg":"ca50e9357181d758 is starting a new election at term 1"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548313+0200","logger":"raft","caller":"[email protected]/raft.go:899","msg":"ca50e9357181d758 became pre-candidate at term 1"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548384+0200","logger":"raft","caller":"[email protected]/raft.go:1043","msg":"ca50e9357181d758 received MsgPreVoteResp from ca50e9357181d758 at term 1"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548402+0200","logger":"raft","caller":"[email protected]/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgPreVoteResp votes and 0 vote rejections"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548425+0200","logger":"raft","caller":"[email protected]/raft.go:883","msg":"ca50e9357181d758 became candidate at term 2"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.559268+0200","logger":"raft","caller":"[email protected]/raft.go:1043","msg":"ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 2"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.559325+0200","logger":"raft","caller":"[email protected]/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgVoteResp votes and 0 vote rejections"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.559374+0200","logger":"raft","caller":"[email protected]/raft.go:938","msg":"ca50e9357181d758 became leader at term 2"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.559391+0200","logger":"raft","caller":"[email protected]/node.go:370","msg":"raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 2"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.561842+0200","caller":"etcdserver/server.go:2318","msg":"setting up initial cluster version using v3 API","cluster-version":"3.6"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.56395+0200","caller":"etcdserver/server.go:1774","msg":"published local member to cluster through raft","local-member-id":"ca50e9357181d758","local-member-attributes":"{Name:member-0 ClientURLs:[http://localhost:20000]}","cluster-id":"34f27e83b3bc2ff","publish-timeout":"6.01s"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.56395+0200","caller":"embed/serve.go:104","msg":"ready to serve client requests"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.564116+0200","caller":"etcdmain/main.go:44","msg":"notifying init daemon"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.564142+0200","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.564779+0200","caller":"v3rpc/health.go:62","msg":"grpc service status changed","service":"","status":"SERVING"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.565461+0200","caller":"membership/cluster.go:594","msg":"set initial cluster version","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","cluster-version":"3.6"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.565615+0200","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.6"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.565717+0200","caller":"version/monitor.go:116","msg":"cluster version differs from storage version.","cluster-version":"3.6.0","storage-version":"3.5.0"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.565719+0200","caller":"etcdserver/server.go:2338","msg":"cluster version is updated","cluster-version":"3.6"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.56587+0200","caller":"schema/migration.go:65","msg":"updated storage version","new-storage-version":"3.6.0"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.567317+0200","caller":"embed/serve.go:188","msg":"serving client traffic insecurely; this is strongly discouraged!","traffic":"grpc+http","address":"127.0.0.1:20000"}
logger.go:146: 2024-06-07T10:45:06.570+0200 INFO started server. {"name": "member-0", "pid": 1897904}
logger.go:146: 2024-06-07T10:45:06.577+0200 INFO Start traffic
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.592935+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":3}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.593112+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":3,"took":"120.206µs","hash":1734659494,"current-db-size-bytes":20480,"current-db-size":"20 kB","current-db-size-in-use-bytes":16384,"current-db-size-in-use":"16 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.593139+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1734659494,"revision":3,"compact-revision":-1}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.694305+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":37}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.700835+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":37,"took":"6.440584ms","hash":2304164051,"current-db-size-bytes":303104,"current-db-size":"303 kB","current-db-size-in-use-bytes":274432,"current-db-size-in-use":"274 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.700886+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":2304164051,"revision":37,"compact-revision":3}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.742881+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":58}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.749877+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":58,"took":"6.926064ms","hash":1646957282,"current-db-size-bytes":376832,"current-db-size":"377 kB","current-db-size-in-use-bytes":335872,"current-db-size-in-use":"336 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.749911+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1646957282,"revision":58,"compact-revision":37}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.802825+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":77}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.813113+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":77,"took":"10.227838ms","hash":539014635,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":417792,"current-db-size-in-use":"418 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.813147+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":539014635,"revision":77,"compact-revision":58}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.85343+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":85}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.857414+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":85,"took":"3.90014ms","hash":1467298815,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":229376,"current-db-size-in-use":"229 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.857438+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1467298815,"revision":85,"compact-revision":77}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.074676+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":97}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.079062+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":97,"took":"4.321109ms","hash":4113805127,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":114688,"current-db-size-in-use":"115 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.079099+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":4113805127,"revision":97,"compact-revision":85}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.148722+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":99}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.153407+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":99,"took":"4.623808ms","hash":1249150654,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":110592,"current-db-size-in-use":"111 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.153465+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1249150654,"revision":99,"compact-revision":97}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.198642+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":104}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.202576+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":104,"took":"3.889339ms","hash":1699712947,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":114688,"current-db-size-in-use":"115 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.202591+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1699712947,"revision":104,"compact-revision":99}
logger.go:146: 2024-06-07T10:45:07.573+0200 INFO Triggering failpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
./etcdctl (./etcdctl_--endpoints=http://localhost:20000_--command-timeout=1.1s_get_0_-w_json_-w_json_-w_json) (1897931): {"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":139,"raft_term":2}}
./etcdctl (./etcdctl_--endpoints=http://localhost:20000_endpoint_status_-w_json) (1897941): [{"Endpoint":"http://localhost:20000","Status":{"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":140,"raft_term":2},"version":"3.6.0-alpha.0","dbSize":450560,"leader":14578408409545168728,"raftIndex":178,"raftTerm":2,"raftAppliedIndex":177,"dbSizeInUse":294912,"storageVersion":"3.6.0"}}]
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.601459+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":137}
cluster.go:1066: members agree on a leader, members:map[14578408409545168728:0] , leader:14578408409545168728
logger.go:146: 2024-06-07T10:45:07.608+0200 INFO Setting up gofailpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
logger.go:146: 2024-06-07T10:45:07.608+0200 INFO Waiting for member to exit {"member": "member-0"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.610769+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":137,"took":"9.25467ms","hash":583134963,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":368640,"current-db-size-in-use":"369 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.610806+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":583134963,"revision":137,"compact-revision":104}
member-0 (1897904): panic: failpoint panic: {}
member-0 (1897904):
member-0 (1897904): goroutine 154 [running]:
member-0 (1897904): go.etcd.io/gofail/runtime.actPanic(0x10100000000?)
member-0 (1897904): go.etcd.io/[email protected]/runtime/terms.go:318 +0x65
member-0 (1897904): go.etcd.io/gofail/runtime.(*term).do(...)
member-0 (1897904): go.etcd.io/[email protected]/runtime/terms.go:290
member-0 (1897904): go.etcd.io/gofail/runtime.(*terms).eval(0xc00016fbc0?)
member-0 (1897904): go.etcd.io/[email protected]/runtime/terms.go:105 +0xe3
member-0 (1897904): go.etcd.io/gofail/runtime.(*Failpoint).Acquire(0xc0001f9c90?)
member-0 (1897904): go.etcd.io/[email protected]/runtime/failpoint.go:38 +0x98
member-0 (1897904): go.etcd.io/etcd/server/v3/etcdserver.(*raftNode).start.func1()
member-0 (1897904): go.etcd.io/etcd/server/v3/etcdserver/raft.go:229 +0x716
member-0 (1897904): created by go.etcd.io/etcd/server/v3/etcdserver.(*raftNode).start in goroutine 168
member-0 (1897904): go.etcd.io/etcd/server/v3/etcdserver/raft.go:166 +0x6f
logger.go:146: 2024-06-07T10:45:07.617+0200 INFO server exited {"name": "member-0", "code": 2}
logger.go:146: 2024-06-07T10:45:07.617+0200 INFO Member exited as expected {"member": "member-0"}
logger.go:146: 2024-06-07T10:45:07.617+0200 INFO starting server... {"name": "member-0"}
logger.go:146: 2024-06-07T10:45:07.617+0200 INFO spawning process {"args": ["./etcd", "--name=member-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001", "--snapshot-count=1000", "--experimental-compaction-batch-limit=100", "--initial-cluster-token=new", "--election-timeout=521", "--experimental-snapshot-catchup-entries=100", "--experimental-watch-progress-notify-interval=100ms", "--snapshot-count=1000", "--heartbeat-interval=101", "--initial-cluster=member-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory2482998486/001", "name": "member-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/[email protected]/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.627613+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_VERIFY=all"}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.627706+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_UNSUPPORTED_ARCH=amd64"}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.627725+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.62775+0200","caller":"etcdmain/etcd.go:63","msg":"Running: ","args":["./etcd","--name=member-0","--listen-client-urls=http://localhost:20000","--advertise-client-urls=http://localhost:20000","--listen-peer-urls=http://localhost:20001","--initial-advertise-peer-urls=http://localhost:20001","--initial-cluster-token=new","--data-dir","/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","--snapshot-count=1000","--experimental-compaction-batch-limit=100","--initial-cluster-token=new","--election-timeout=521","--experimental-snapshot-catchup-entries=100","--experimental-watch-progress-notify-interval=100ms","--snapshot-count=1000","--heartbeat-interval=101","--initial-cluster=member-0=http://localhost:20001","--initial-cluster-state=new"]}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.627828+0200","caller":"etcdmain/etcd.go:106","msg":"server has already been initialized","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","dir-type":"member"}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.627848+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.627856+0200","caller":"embed/etcd.go:125","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:20001"]}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.628737+0200","caller":"embed/etcd.go:133","msg":"configuring client listeners","listen-client-urls":["http://localhost:20000"]}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.628956+0200","caller":"embed/etcd.go:314","msg":"starting an etcd server","etcd-version":"3.6.0-alpha.0","git-sha":"6f916d428","go-version":"go1.22.4","go-os":"linux","go-arch":"amd64","max-cpu-set":24,"max-cpu-available":24,"member-initialized":true,"name":"member-0","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","wal-dir":"","wal-dir-dedicated":"","member-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member","force-new-cluster":false,"heartbeat-interval":"101ms","election-timeout":"505ms","initial-election-tick-advance":true,"snapshot-count":1000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":100,"initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[],"experimental-set-member-localaddr":false,"experimental-local-address":"","cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.62947+0200","logger":"bbolt","caller":"backend/backend.go:197","msg":"Opening db file (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db) with mode 2d72772d2d2d2d2d2d2d and with options: {Timeout: 0s, NoGrowSync: false, NoFreelistSync: true, PreLoadFreelist: false, FreelistType: hashmap, ReadOnly: false, MmapFlags: 8000, InitialMmapSize: 10737418240, PageSize: 0, NoSync: false, OpenFile: 0x0, Mlock: false, Logger: 0xc000126248}"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.636941+0200","logger":"bbolt","caller":"[email protected]/db.go:321","msg":"Opening bbolt db (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db) successfully"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.637029+0200","caller":"storage/backend.go:80","msg":"opened backend db","path":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db","took":"7.640918ms"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.637902+0200","caller":"etcdserver/bootstrap.go:440","msg":"No snapshot found. Recovering WAL from scratch!"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.637928+0200","caller":"etcdserver/bootstrap.go:231","msg":"recovered v3 backend","backend-size-bytes":520192,"backend-size":"520 kB","backend-size-in-use-bytes":126976,"backend-size-in-use":"127 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.639787+0200","caller":"etcdserver/bootstrap.go:579","msg":"restarting local member","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","commit-index":178}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.639855+0200","caller":"etcdserver/bootstrap.go:93","msg":"bootstrapping cluster"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640008+0200","caller":"etcdserver/bootstrap.go:100","msg":"bootstrapping storage"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640301+0200","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.6"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640316+0200","caller":"membership/cluster.go:291","msg":"recovered/added member from store","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","recovered-remote-peer-id":"ca50e9357181d758","recovered-remote-peer-urls":["http://localhost:20001"],"recovered-remote-peer-is-learner":false}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640327+0200","caller":"membership/cluster.go:301","msg":"set cluster version from store","cluster-version":"3.6"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640337+0200","caller":"etcdserver/bootstrap.go:108","msg":"bootstrapping raft"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640351+0200","caller":"etcdserver/server.go:306","msg":"bootstrap successfully"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640396+0200","logger":"raft","caller":"[email protected]/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=()"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640423+0200","logger":"raft","caller":"[email protected]/raft.go:870","msg":"ca50e9357181d758 became follower at term 2"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640438+0200","logger":"raft","caller":"[email protected]/raft.go:483","msg":"newRaft ca50e9357181d758 [peers: [], term: 2, commit: 178, applied: 0, lastindex: 179, lastterm: 2]"}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.644606+0200","caller":"auth/store.go:1134","msg":"simple token is not cryptographically signed"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.648367+0200","caller":"mvcc/kvstore.go:330","msg":"restored last compact revision","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":137}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.648847+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":140}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660157+0200","caller":"storage/quota.go:93","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660204+0200","caller":"etcdserver/server.go:576","msg":"starting etcd server","local-member-id":"ca50e9357181d758","local-server-version":"3.6.0-alpha.0","cluster-id":"34f27e83b3bc2ff","cluster-version":"3.6"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660292+0200","caller":"etcdserver/server.go:470","msg":"started as single-node; fast-forwarding election ticks","local-member-id":"ca50e9357181d758","forward-ticks":4,"forward-duration":"404ms","election-ticks":5,"election-timeout":"505ms"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660307+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap","suffix":"snap.db","max":5,"interval":"30s"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660366+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap","suffix":"snap","max":5,"interval":"30s"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660347+0200","caller":"embed/etcd.go:610","msg":"serving peer traffic","address":"127.0.0.1:20001"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660385+0200","caller":"embed/etcd.go:283","msg":"now serving peer/client/metrics","local-member-id":"ca50e9357181d758","initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[]}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660397+0200","caller":"embed/etcd.go:582","msg":"cmux::serve","address":"127.0.0.1:20001"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660382+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/wal","suffix":"wal","max":5,"interval":"30s"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660464+0200","logger":"raft","caller":"[email protected]/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=(14578408409545168728)"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.66053+0200","caller":"membership/cluster.go:420","msg":"ignore already added member","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","added-peer-id":"ca50e9357181d758","added-peer-peer-urls":["http://localhost:20001"],"added-peer-is-learner":false}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660578+0200","caller":"membership/cluster.go:586","msg":"updated cluster version","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","from":"3.6","to":"3.6"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.944302+0200","logger":"raft","caller":"[email protected]/raft.go:956","msg":"ca50e9357181d758 is starting a new election at term 2"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.944353+0200","logger":"raft","caller":"[email protected]/raft.go:899","msg":"ca50e9357181d758 became pre-candidate at term 2"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.944416+0200","logger":"raft","caller":"[email protected]/raft.go:1043","msg":"ca50e9357181d758 received MsgPreVoteResp from ca50e9357181d758 at term 2"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.94444+0200","logger":"raft","caller":"[email protected]/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgPreVoteResp votes and 0 vote rejections"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.944485+0200","logger":"raft","caller":"[email protected]/raft.go:883","msg":"ca50e9357181d758 became candidate at term 3"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.95921+0200","logger":"raft","caller":"[email protected]/raft.go:1043","msg":"ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 3"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.959254+0200","logger":"raft","caller":"[email protected]/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgVoteResp votes and 0 vote rejections"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.959286+0200","logger":"raft","caller":"[email protected]/raft.go:938","msg":"ca50e9357181d758 became leader at term 3"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.9593+0200","logger":"raft","caller":"[email protected]/node.go:370","msg":"raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 3"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.964178+0200","caller":"etcdserver/server.go:1774","msg":"published local member to cluster through raft","local-member-id":"ca50e9357181d758","local-member-attributes":"{Name:member-0 ClientURLs:[http://localhost:20000]}","cluster-id":"34f27e83b3bc2ff","publish-timeout":"6.01s"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.964176+0200","caller":"embed/serve.go:104","msg":"ready to serve client requests"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.964348+0200","caller":"etcdmain/main.go:44","msg":"notifying init daemon"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.96437+0200","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.964837+0200","caller":"v3rpc/health.go:62","msg":"grpc service status changed","service":"","status":"SERVING"}
logger.go:146: 2024-06-07T10:45:07.966+0200 INFO started server. {"name": "member-0", "pid": 1897950}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.966818+0200","caller":"embed/serve.go:188","msg":"serving client traffic insecurely; this is strongly discouraged!","traffic":"grpc+http","address":"127.0.0.1:20000"}
logger.go:146: 2024-06-07T10:45:07.967+0200 INFO Finished triggering failpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.640086+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":141}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.647928+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":141,"took":"7.766192ms","hash":361777187,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":258048,"current-db-size-in-use":"258 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.647968+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":361777187,"revision":141,"compact-revision":137}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.660836+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":149}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.668671+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":149,"took":"7.779358ms","hash":3929730471,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":262144,"current-db-size-in-use":"262 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.668707+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":3929730471,"revision":149,"compact-revision":141}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.679649+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":158}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.691774+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":158,"took":"12.045594ms","hash":1641924706,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":225280,"current-db-size-in-use":"225 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.691827+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1641924706,"revision":158,"compact-revision":149}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.691851+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":159}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.699572+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":159,"took":"7.690631ms","hash":1830603898,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":147456,"current-db-size-in-use":"148 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.6996+0200","caller":"mvcc/kvstore.go:248","msg":"previous compaction was interrupted, skip storing compaction hash value"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.69961+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":161}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.703539+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":161,"took":"3.907723ms","hash":545936533,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":188416,"current-db-size-in-use":"188 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.703569+0200","caller":"mvcc/kvstore.go:248","msg":"previous compaction was interrupted, skip storing compaction hash value"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.726023+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":168}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.730396+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":168,"took":"4.324136ms","hash":4120988724,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":233472,"current-db-size-in-use":"234 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.730417+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":4120988724,"revision":168,"compact-revision":161}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.797393+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":213}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.801495+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":213,"took":"4.014924ms","hash":2299796790,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":319488,"current-db-size-in-use":"320 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.801512+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":2299796790,"revision":213,"compact-revision":168}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.822852+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":225}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.828806+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":225,"took":"5.900397ms","hash":3531019373,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":65536,"current-db-size-in-use":"66 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.828848+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":3531019373,"revision":225,"compact-revision":213}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.944933+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":240}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.948934+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":240,"took":"3.945123ms","hash":2274426623,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":69632,"current-db-size-in-use":"70 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.948977+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":2274426623,"revision":240,"compact-revision":225}
logger.go:146: 2024-06-07T10:45:09.002+0200 INFO Finished traffic
logger.go:146: 2024-06-07T10:45:10.014+0200 INFO Reporting complete traffic {"successes": 678, "failures": 31, "successRate": 0.9562764456981664, "period": "2.425517855s", "qps": 279.52793610748336}
logger.go:146: 2024-06-07T10:45:10.014+0200 INFO Reporting traffic before failure injection {"successes": 402, "failures": 0, "successRate": 1, "period": "996.95569ms", "qps": 403.22754966171067}
logger.go:146: 2024-06-07T10:45:10.014+0200 INFO Reporting traffic during failure injection {"successes": 7, "failures": 14, "successRate": 0.3333333333333333, "period": "393.883383ms", "qps": 17.77175758643263}
logger.go:146: 2024-06-07T10:45:10.014+0200 INFO Reporting traffic after failure injection {"successes": 269, "failures": 17, "successRate": 0.9405594405594405, "period": "1.034678782s", "qps": 259.98406914272647}
logger.go:146: 2024-06-07T10:45:10.015+0200 INFO Finished simulating traffic {"max-revision": 248}
logger.go:146: 2024-06-07T10:45:10.027+0200 INFO Validating linearizable operations {"timeout": "5m0s"}
logger.go:146: 2024-06-07T10:45:10.502+0200 INFO Linearization success {"duration": "475.746216ms"}
logger.go:146: 2024-06-07T10:45:10.503+0200 INFO Validating watch
logger.go:146: 2024-06-07T10:45:10.509+0200 ERROR Broke watch guarantee {"guarantee": "reliable", "client": 2, "diff": " []model.PersistedEvent{\n \t... // 154 identical elements\n \t{Event: {Type: \"put-operation\", Key: \"key3\", Value: {Hash: 3071434148}}, Revision: 139},\n \t{Event: {Type: \"put-operation\", Key: \"key5\", Value: {Hash: 2150768673}}, Revision: 140},\n- \t{Event: model.Event{Type: \"delete-operation\", Key: \"key6\"}, Revision: 141},\n \t{Event: {Type: \"put-operation\", Key: \"key8\", Value: {Value: \"121\"}}, Revision: 141, ...},\n \t{Event: {Type: \"put-operation\", Key: \"key9\", Value: {Value: \"122\"}}, Revision: 141},\n \t... // 134 identical elements\n }\n"}
validate.go:49: Failed validating watch history, err: broke Reliable - a sequence of events will never drop any subsequence of events; if there are events ordered in time as a < b < c, then if the watch receives events a and c, it is guaranteed to receive b
logger.go:146: 2024-06-07T10:45:10.509+0200 INFO Validating serializable operations
logger.go:146: 2024-06-07T10:45:10.517+0200 INFO Saving robustness test report {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368"}
logger.go:146: 2024-06-07T10:45:10.517+0200 INFO Saving member data dir {"member": "member-0", "path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/server-member-0"}
logger.go:146: 2024-06-07T10:45:10.518+0200 INFO no watch operations for client, skip persisting {"client-id": 1}
logger.go:146: 2024-06-07T10:45:10.518+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-1/operations.json"}
logger.go:146: 2024-06-07T10:45:10.519+0200 INFO Saving watch operations {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-2/watch.json"}
logger.go:146: 2024-06-07T10:45:10.519+0200 INFO no KV operations for client, skip persisting {"client-id": 2}
logger.go:146: 2024-06-07T10:45:10.519+0200 INFO no watch operations for client, skip persisting {"client-id": 3}
logger.go:146: 2024-06-07T10:45:10.519+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-3/operations.json"}
logger.go:146: 2024-06-07T10:45:10.520+0200 INFO no watch operations for client, skip persisting {"client-id": 4}
logger.go:146: 2024-06-07T10:45:10.520+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-4/operations.json"}
logger.go:146: 2024-06-07T10:45:10.522+0200 INFO no watch operations for client, skip persisting {"client-id": 5}
logger.go:146: 2024-06-07T10:45:10.522+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-5/operations.json"}
logger.go:146: 2024-06-07T10:45:10.523+0200 INFO no watch operations for client, skip persisting {"client-id": 6}
logger.go:146: 2024-06-07T10:45:10.523+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-6/operations.json"}
logger.go:146: 2024-06-07T10:45:10.524+0200 INFO no watch operations for client, skip persisting {"client-id": 7}
logger.go:146: 2024-06-07T10:45:10.524+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-7/operations.json"}
logger.go:146: 2024-06-07T10:45:10.525+0200 INFO no watch operations for client, skip persisting {"client-id": 8}
logger.go:146: 2024-06-07T10:45:10.525+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-8/operations.json"}
logger.go:146: 2024-06-07T10:45:10.526+0200 INFO no watch operations for client, skip persisting {"client-id": 9}
logger.go:146: 2024-06-07T10:45:10.526+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-9/operations.json"}
logger.go:146: 2024-06-07T10:45:10.527+0200 INFO no watch operations for client, skip persisting {"client-id": 10}
logger.go:146: 2024-06-07T10:45:10.527+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-10/operations.json"}
logger.go:146: 2024-06-07T10:45:10.528+0200 INFO Saving visualization {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/history.html"}
logger.go:146: 2024-06-07T10:45:10.583+0200 INFO killing server... {"name": "member-0"}
logger.go:146: 2024-06-07T10:45:10.583+0200 INFO stopping server... {"name": "member-0"}
logger.go:146: 2024-06-07T10:45:10.588+0200 INFO stopped server. {"name": "member-0"}
cc @ahrtr @jmhbnz @wenjiaswe @MadhavJivrajani @siyuanfoundation
@fuweid Think this is related to https://github.com/etcd-io/etcd/issues/17780
See the following logs:
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.648367+0200","caller":"mvcc/kvstore.go:330","msg":"restored last compact revision","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":137}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.648847+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":140}
...
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.966818+0200","caller":"embed/serve.go:188","msg":"serving client traffic insecurely; this is strongly discouraged!","traffic":"grpc+http","address":"127.0.0.1:20000"}
logger.go:146: 2024-06-07T10:45:07.967+0200 INFO Finished triggering failpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.640086+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":141}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.647928+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":141,"took":"7.766192ms","hash":361777187,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":258048,"current-db-size-in-use":"258 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.647968+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":361777187,"revision":141,"compact-revision":137}
Looks like we lost first delete after restoring from restoring from "last compact revision". Off by one error on tombstone?
Yes. Let me try to reproduce it in my local. Update it later.
I managed to reproduce it without multi TXN, just a delete request. This extends the impact to Kubernetes.
Again the common thing is that etcd misses a watch even if it restores kvstore on revision before a delete.
In last repro:
logger.go:146: 2024-06-07T14:46:59.319+0200 ERROR Broke watch guarantee {"guarantee": "reliable", "client": 1}
[]model.PersistedEvent{
... // 160 identical elements
{Event: {Type: "delete-operation", Key: "key6"}, Revision: 140},
{Event: {Type: "put-operation", Key: "key9", Value: {Value: "138"}}, Revision: 141, ...},
- {Event: model.Event{Type: "delete-operation", Key: "key7"}, Revision: 142},
{Event: {Type: "put-operation", Key: "key3", Value: {Value: "147"}}, Revision: 143},
{Event: {Type: "put-operation", Key: "key3", Value: {Value: "148"}}, Revision: 144},
... // 79 identical elements
}
etcd bootstrap log:
{"level":"info","ts":"2024-06-07T14:46:56.832687+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":141}
Something similar reproduced on v3.4, however this time it was long after crash, and it flagged broken resumable properly. Which just means it provided incorrect event on a newly opened watch which provided revision.
=== RUN TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1
logger.go:146: 2024-06-07T15:10:05.793+0200 INFO starting server... {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
logger.go:146: 2024-06-07T15:10:05.793+0200 INFO spawning process {"args": ["/tmp/etcd-release-3.4-failpoints/bin/etcd", "--name=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001", "--snapshot-count=1000", "--snapshot-count=1000", "--initial-cluster-token=new", "--experimental-compaction-batch-limit=100", "--experimental-watch-progress-notify-interval=100ms", "--initial-cluster=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory1937017586/001", "name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/[email protected]/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
member-0 (2063001): WARNING: Package "github.com/golang/protobuf/protoc-gen-go/generator" is deprecated.
member-0 (2063001): A future release of golang/protobuf will delete this package,
member-0 (2063001): which has long been excluded from the compatibility promise.
member-0 (2063001):
member-0 (2063001): 2024-06-07 15:10:05.801650 W | pkg/flags: unrecognized environment variable ETCD_VERIFY=all
member-0 (2063001): 2024-06-07 15:10:05.801710 W | pkg/flags: unrecognized environment variable ETCD_UNSUPPORTED_ARCH=amd64
member-0 (2063001): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
member-0 (2063001): 2024-06-07 15:10:05.801731 W | embed: Running http and grpc server on single port. This is not recommended for production.
member-0 (2063001): 2024-06-07 15:10:05.801737 I | etcdmain: etcd Version: 3.4.32
member-0 (2063001): 2024-06-07 15:10:05.801741 I | etcdmain: Git SHA: 88ff600-FAILPOINTS
member-0 (2063001): 2024-06-07 15:10:05.801745 I | etcdmain: Go Version: go1.22.0
member-0 (2063001): 2024-06-07 15:10:05.801750 I | etcdmain: Go OS/Arch: linux/amd64
member-0 (2063001): 2024-06-07 15:10:05.801755 I | etcdmain: setting maximum number of CPUs to 24, total number of available CPUs is 24
member-0 (2063001): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
member-0 (2063001): 2024-06-07 15:10:05.802103 W | embed: Running http and grpc server on single port. This is not recommended for production.
member-0 (2063001): 2024-06-07 15:10:05.802482 I | embed: name = TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0
member-0 (2063001): 2024-06-07 15:10:05.802489 I | embed: data dir = /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001
member-0 (2063001): 2024-06-07 15:10:05.802494 I | embed: member dir = /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member
member-0 (2063001): 2024-06-07 15:10:05.802500 I | embed: heartbeat = 100ms
member-0 (2063001): 2024-06-07 15:10:05.802505 I | embed: election = 1000ms
member-0 (2063001): 2024-06-07 15:10:05.802510 I | embed: snapshot count = 1000
member-0 (2063001): 2024-06-07 15:10:05.802521 I | embed: advertise client URLs = http://localhost:20000
member-0 (2063001): 2024-06-07 15:10:05.823121 I | etcdserver: starting member ca50e9357181d758 in cluster 34f27e83b3bc2ff
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 switched to configuration voters=()
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 became follower at term 0
member-0 (2063001): raft2024/06/07 15:10:05 INFO: newRaft ca50e9357181d758 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 became follower at term 1
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 switched to configuration voters=(14578408409545168728)
member-0 (2063001): 2024-06-07 15:10:05.837990 W | auth: simple token is not cryptographically signed
member-0 (2063001): 2024-06-07 15:10:05.849603 I | etcdserver: starting server... [version: 3.4.32, cluster version: to_be_decided]
member-0 (2063001): 2024-06-07 15:10:05.849682 I | etcdserver: ca50e9357181d758 as single-node; fast-forwarding 9 ticks (election ticks 10)
member-0 (2063001): 2024-06-07 15:10:05.849979 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/snap, suffix: snap.db, max: 5, interval: 30s
member-0 (2063001): 2024-06-07 15:10:05.850015 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/snap, suffix: snap, max: 5, interval: 30s
member-0 (2063001): 2024-06-07 15:10:05.850029 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/wal, suffix: wal, max: 5, interval: 30s
member-0 (2063001): 2024-06-07 15:10:05.851460 I | embed: listening for peers on 127.0.0.1:20001
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 switched to configuration voters=(14578408409545168728)
member-0 (2063001): 2024-06-07 15:10:05.852075 I | etcdserver/membership: added member ca50e9357181d758 [http://localhost:20001] to cluster 34f27e83b3bc2ff
member-0 (2063001): raft2024/06/07 15:10:06 INFO: ca50e9357181d758 is starting a new election at term 1
member-0 (2063001): raft2024/06/07 15:10:06 INFO: ca50e9357181d758 became candidate at term 2
member-0 (2063001): raft2024/06/07 15:10:06 INFO: ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 2
member-0 (2063001): raft2024/06/07 15:10:06 INFO: ca50e9357181d758 became leader at term 2
member-0 (2063001): raft2024/06/07 15:10:06 INFO: raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 2
member-0 (2063001): 2024-06-07 15:10:06.425669 I | etcdserver: setting up the initial cluster version to 3.4
member-0 (2063001): 2024-06-07 15:10:06.427632 I | embed: ready to serve client requests
member-0 (2063001): 2024-06-07 15:10:06.427648 I | etcdserver: published {Name:TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0 ClientURLs:[http://localhost:20000]} to cluster 34f27e83b3bc2ff
member-0 (2063001): 2024-06-07 15:10:06.428428 N | embed: serving insecure client requests on 127.0.0.1:20000, this is strongly discouraged!
member-0 (2063001): 2024-06-07 15:10:06.429334 N | etcdserver/membership: set the initial cluster version to 3.4
member-0 (2063001): 2024-06-07 15:10:06.429387 I | etcdserver/api: enabled capabilities for version 3.4
logger.go:146: 2024-06-07T15:10:06.436+0200 INFO started server. {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "pid": 2063001}
logger.go:146: 2024-06-07T15:10:06.443+0200 INFO Start traffic
member-0 (2063001): 2024-06-07 15:10:06.463482 I | mvcc: store.index: compact 10
member-0 (2063001): 2024-06-07 15:10:06.467626 I | mvcc: finished scheduled compaction at 10 (took 4.101548ms)
member-0 (2063001): 2024-06-07 15:10:06.485280 I | mvcc: store.index: compact 21
member-0 (2063001): 2024-06-07 15:10:06.491005 I | mvcc: finished scheduled compaction at 21 (took 5.681245ms)
member-0 (2063001): 2024-06-07 15:10:06.499354 I | mvcc: store.index: compact 30
member-0 (2063001): 2024-06-07 15:10:06.503435 I | mvcc: finished scheduled compaction at 30 (took 4.021316ms)
member-0 (2063001): 2024-06-07 15:10:06.516624 I | mvcc: store.index: compact 36
member-0 (2063001): 2024-06-07 15:10:06.522273 I | mvcc: finished scheduled compaction at 36 (took 5.604531ms)
member-0 (2063001): 2024-06-07 15:10:06.531697 I | mvcc: store.index: compact 43
member-0 (2063001): 2024-06-07 15:10:06.544126 I | mvcc: finished scheduled compaction at 43 (took 12.389592ms)
member-0 (2063001): 2024-06-07 15:10:06.544147 I | mvcc: store.index: compact 45
member-0 (2063001): 2024-06-07 15:10:06.549828 I | mvcc: finished scheduled compaction at 45 (took 5.663001ms)
member-0 (2063001): 2024-06-07 15:10:06.565709 I | mvcc: store.index: compact 52
member-0 (2063001): 2024-06-07 15:10:06.569921 I | mvcc: finished scheduled compaction at 52 (took 4.110444ms)
member-0 (2063001): 2024-06-07 15:10:06.576837 I | mvcc: store.index: compact 58
member-0 (2063001): 2024-06-07 15:10:06.580559 I | mvcc: finished scheduled compaction at 58 (took 3.700063ms)
member-0 (2063001): 2024-06-07 15:10:06.593001 I | mvcc: store.index: compact 63
member-0 (2063001): 2024-06-07 15:10:06.598580 I | mvcc: finished scheduled compaction at 63 (took 5.512127ms)
member-0 (2063001): 2024-06-07 15:10:06.610192 I | mvcc: store.index: compact 74
member-0 (2063001): 2024-06-07 15:10:06.610655 W | etcdserver: failed to apply request "compaction:<revision:74 > header:<ID:15517310789174624164 > " with response "" took (2.796µs) to execute, err is mvcc: required revision has been compacted
member-0 (2063001): 2024-06-07 15:10:06.614154 I | mvcc: finished scheduled compaction at 74 (took 3.943601ms)
member-0 (2063001): 2024-06-07 15:10:06.628541 I | mvcc: store.index: compact 76
member-0 (2063001): 2024-06-07 15:10:06.632295 I | mvcc: finished scheduled compaction at 76 (took 3.736361ms)
member-0 (2063001): 2024-06-07 15:10:06.638518 I | mvcc: store.index: compact 79
member-0 (2063001): 2024-06-07 15:10:06.642166 I | mvcc: finished scheduled compaction at 79 (took 3.625864ms)
member-0 (2063001): 2024-06-07 15:10:06.695912 I | mvcc: store.index: compact 81
member-0 (2063001): 2024-06-07 15:10:06.699766 I | mvcc: finished scheduled compaction at 81 (took 3.833413ms)
member-0 (2063001): 2024-06-07 15:10:06.811286 I | mvcc: store.index: compact 88
member-0 (2063001): 2024-06-07 15:10:06.815069 I | mvcc: finished scheduled compaction at 88 (took 3.751339ms)
member-0 (2063001): 2024-06-07 15:10:06.926055 I | mvcc: store.index: compact 98
member-0 (2063001): 2024-06-07 15:10:06.929943 I | mvcc: finished scheduled compaction at 98 (took 3.865914ms)
member-0 (2063001): 2024-06-07 15:10:07.006149 I | mvcc: store.index: compact 104
member-0 (2063001): 2024-06-07 15:10:07.009806 I | mvcc: finished scheduled compaction at 104 (took 3.640982ms)
member-0 (2063001): 2024-06-07 15:10:07.047942 I | mvcc: store.index: compact 105
member-0 (2063001): 2024-06-07 15:10:07.051740 I | mvcc: finished scheduled compaction at 105 (took 3.764455ms)
member-0 (2063001): 2024-06-07 15:10:07.161575 I | mvcc: store.index: compact 113
member-0 (2063001): 2024-06-07 15:10:07.165496 I | mvcc: finished scheduled compaction at 113 (took 3.891012ms)
member-0 (2063001): 2024-06-07 15:10:07.169287 I | mvcc: store.index: compact 114
member-0 (2063001): 2024-06-07 15:10:07.173334 I | mvcc: finished scheduled compaction at 114 (took 4.008702ms)
member-0 (2063001): 2024-06-07 15:10:07.177192 I | mvcc: store.index: compact 115
member-0 (2063001): 2024-06-07 15:10:07.177233 I | mvcc: finished scheduled compaction at 115 (took 15.309µs)
member-0 (2063001): 2024-06-07 15:10:07.336646 I | mvcc: store.index: compact 125
member-0 (2063001): 2024-06-07 15:10:07.340373 I | mvcc: finished scheduled compaction at 125 (took 3.708237ms)
member-0 (2063001): 2024-06-07 15:10:07.396406 I | mvcc: store.index: compact 129
member-0 (2063001): 2024-06-07 15:10:07.400353 I | mvcc: finished scheduled compaction at 129 (took 3.92806ms)
member-0 (2063001): 2024-06-07 15:10:07.421834 I | mvcc: store.index: compact 131
member-0 (2063001): 2024-06-07 15:10:07.426046 I | mvcc: finished scheduled compaction at 131 (took 4.182771ms)
member-0 (2063001): 2024-06-07 15:10:07.432261 W | etcdserver: failed to apply request "compaction:<revision:131 > header:<ID:15517310789174624338 > " with response "" took (9.297µs) to execute, err is mvcc: required revision has been compacted
logger.go:146: 2024-06-07T15:10:07.439+0200 INFO Triggering failpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
/tmp/etcd-release-3.4-failpoints/bin/etcdctl (/tmp/etcd-release-3.4-failpoints/bin/etcdctl_--endpoints=http://localhost:20000_--command-timeout=1.1s_get_0_-w_json_-w_json_-w_json) (2063029): {"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":136,"raft_term":2}}
member-0 (2063001): 2024-06-07 15:10:07.470978 I | mvcc: store.index: compact 135
/tmp/etcd-release-3.4-failpoints/bin/etcdctl (/tmp/etcd-release-3.4-failpoints/bin/etcdctl_--endpoints=http://localhost:20000_endpoint_status_-w_json) (2063037): [{"Endpoint":"http://localhost:20000","Status":{"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":137,"raft_term":2},"version":"3.4.32","dbSize":24576,"leader":14578408409545168728,"raftIndex":171,"raftTerm":2,"raftAppliedIndex":171,"dbSizeInUse":16384}}]
cluster.go:1066: members agree on a leader, members:map[14578408409545168728:0] , leader:14578408409545168728
logger.go:146: 2024-06-07T15:10:07.472+0200 INFO Setting up gofailpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
logger.go:146: 2024-06-07T15:10:07.472+0200 INFO Waiting for member to exit {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
member-0 (2063001): 2024-06-07 15:10:07.475266 I | mvcc: finished scheduled compaction at 135 (took 4.25666ms)
member-0 (2063001): panic: failpoint panic: {}
member-0 (2063001):
member-0 (2063001): goroutine 120 [running]:
member-0 (2063001): go.etcd.io/gofail/runtime.actPanic(0x0?)
member-0 (2063001): ./pkg/mod/go.etcd.io/[email protected]/runtime/terms.go:318 +0x65
member-0 (2063001): go.etcd.io/gofail/runtime.(*term).do(...)
member-0 (2063001): ./pkg/mod/go.etcd.io/[email protected]/runtime/terms.go:290
member-0 (2063001): go.etcd.io/gofail/runtime.(*terms).eval(0xc0000bcde0?)
member-0 (2063001): ./pkg/mod/go.etcd.io/[email protected]/runtime/terms.go:105 +0xe3
member-0 (2063001): go.etcd.io/gofail/runtime.(*Failpoint).Acquire(0xc00051bba0?)
member-0 (2063001): ./pkg/mod/go.etcd.io/[email protected]/runtime/failpoint.go:38 +0x98
member-0 (2063001): go.etcd.io/etcd/etcdserver.(*raftNode).start.func1()
member-0 (2063001): /tmp/etcd-release-3.4-failpoints/etcdserver/raft.go:242 +0xae5
member-0 (2063001): created by go.etcd.io/etcd/etcdserver.(*raftNode).start in goroutine 109
member-0 (2063001): /tmp/etcd-release-3.4-failpoints/etcdserver/raft.go:165 +0x6f
logger.go:146: 2024-06-07T15:10:07.479+0200 INFO server exited {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "code": 2}
logger.go:146: 2024-06-07T15:10:07.479+0200 INFO Member exited as expected {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
logger.go:146: 2024-06-07T15:10:07.479+0200 INFO starting server... {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
logger.go:146: 2024-06-07T15:10:07.479+0200 INFO spawning process {"args": ["/tmp/etcd-release-3.4-failpoints/bin/etcd", "--name=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001", "--snapshot-count=1000", "--snapshot-count=1000", "--initial-cluster-token=new", "--experimental-compaction-batch-limit=100", "--experimental-watch-progress-notify-interval=100ms", "--initial-cluster=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory1937017586/001", "name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/[email protected]/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
member-0 (2063048): WARNING: Package "github.com/golang/protobuf/protoc-gen-go/generator" is deprecated.
member-0 (2063048): A future release of golang/protobuf will delete this package,
member-0 (2063048): which has long been excluded from the compatibility promise.
member-0 (2063048):
member-0 (2063048): 2024-06-07 15:10:07.488960 W | pkg/flags: unrecognized environment variable ETCD_VERIFY=all
member-0 (2063048): 2024-06-07 15:10:07.489016 W | pkg/flags: unrecognized environment variable ETCD_UNSUPPORTED_ARCH=amd64
member-0 (2063048): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
member-0 (2063048): 2024-06-07 15:10:07.489034 W | embed: Running http and grpc server on single port. This is not recommended for production.
member-0 (2063048): 2024-06-07 15:10:07.489040 I | etcdmain: etcd Version: 3.4.32
member-0 (2063048): 2024-06-07 15:10:07.489044 I | etcdmain: Git SHA: 88ff600-FAILPOINTS
member-0 (2063048): 2024-06-07 15:10:07.489048 I | etcdmain: Go Version: go1.22.0
member-0 (2063048): 2024-06-07 15:10:07.489052 I | etcdmain: Go OS/Arch: linux/amd64
member-0 (2063048): 2024-06-07 15:10:07.489056 I | etcdmain: setting maximum number of CPUs to 24, total number of available CPUs is 24
member-0 (2063048): 2024-06-07 15:10:07.489120 N | etcdmain: the server is already initialized as member before, starting as etcd member...
member-0 (2063048): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
member-0 (2063048): 2024-06-07 15:10:07.489130 W | embed: Running http and grpc server on single port. This is not recommended for production.
member-0 (2063048): 2024-06-07 15:10:07.489546 I | embed: name = TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0
member-0 (2063048): 2024-06-07 15:10:07.489553 I | embed: data dir = /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001
member-0 (2063048): 2024-06-07 15:10:07.489558 I | embed: member dir = /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member
member-0 (2063048): 2024-06-07 15:10:07.489562 I | embed: heartbeat = 100ms
member-0 (2063048): 2024-06-07 15:10:07.489566 I | embed: election = 1000ms
member-0 (2063048): 2024-06-07 15:10:07.489571 I | embed: snapshot count = 1000
member-0 (2063048): 2024-06-07 15:10:07.489581 I | embed: advertise client URLs = http://localhost:20000
member-0 (2063048): 2024-06-07 15:10:07.489586 I | embed: initial advertise peer URLs = http://localhost:20001
member-0 (2063048): 2024-06-07 15:10:07.489591 I | embed: initial cluster =
member-0 (2063048): 2024-06-07 15:10:07.498379 I | etcdserver: restarting member ca50e9357181d758 in cluster 34f27e83b3bc2ff at commit index 172
member-0 (2063048): raft2024/06/07 15:10:07 INFO: ca50e9357181d758 switched to configuration voters=()
member-0 (2063048): raft2024/06/07 15:10:07 INFO: ca50e9357181d758 became follower at term 2
member-0 (2063048): raft2024/06/07 15:10:07 INFO: newRaft ca50e9357181d758 [peers: [], term: 2, commit: 172, applied: 0, lastindex: 172, lastterm: 2]
member-0 (2063048): 2024-06-07 15:10:07.501601 W | auth: simple token is not cryptographically signed
member-0 (2063048): 2024-06-07 15:10:07.505904 I | mvcc: restore compact to 135
member-0 (2063048): 2024-06-07 15:10:07.508923 I | etcdserver: starting server... [version: 3.4.32, cluster version: to_be_decided]
member-0 (2063048): 2024-06-07 15:10:07.509090 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/snap, suffix: snap.db, max: 5, interval: 30s
member-0 (2063048): 2024-06-07 15:10:07.509121 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/snap, suffix: snap, max: 5, interval: 30s
member-0 (2063048): 2024-06-07 15:10:07.509492 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/wal, suffix: wal, max: 5, interval: 30s
member-0 (2063048): raft2024/06/07 15:10:07 INFO: ca50e9357181d758 switched to configuration voters=(14578408409545168728)
member-0 (2063048): 2024-06-07 15:10:07.509806 I | etcdserver/membership: added member ca50e9357181d758 [http://localhost:20001] to cluster 34f27e83b3bc2ff
member-0 (2063048): 2024-06-07 15:10:07.509871 N | etcdserver/membership: set the initial cluster version to 3.4
member-0 (2063048): 2024-06-07 15:10:07.509893 I | etcdserver/api: enabled capabilities for version 3.4
member-0 (2063048): 2024-06-07 15:10:07.510650 W | etcdserver: failed to apply request "compaction:<revision:135 > header:<ID:15517310789174624347 > " with response "" took (8.226µs) to execute, err is mvcc: required revision has been compacted
member-0 (2063048): 2024-06-07 15:10:07.511244 I | embed: listening for peers on 127.0.0.1:20001
member-0 (2063048): raft2024/06/07 15:10:08 INFO: ca50e9357181d758 is starting a new election at term 2
member-0 (2063048): raft2024/06/07 15:10:08 INFO: ca50e9357181d758 became candidate at term 3
member-0 (2063048): raft2024/06/07 15:10:08 INFO: ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 3
member-0 (2063048): raft2024/06/07 15:10:08 INFO: ca50e9357181d758 became leader at term 3
member-0 (2063048): raft2024/06/07 15:10:08 INFO: raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 3
member-0 (2063048): 2024-06-07 15:10:08.716725 I | embed: ready to serve client requests
member-0 (2063048): 2024-06-07 15:10:08.716809 I | etcdserver: published {Name:TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0 ClientURLs:[http://localhost:20000]} to cluster 34f27e83b3bc2ff
member-0 (2063048): 2024-06-07 15:10:08.718833 N | embed: serving insecure client requests on 127.0.0.1:20000, this is strongly discouraged!
logger.go:146: 2024-06-07T15:10:08.723+0200 INFO started server. {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "pid": 2063048}
logger.go:146: 2024-06-07T15:10:08.724+0200 INFO Finished triggering failpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
member-0 (2063048): 2024-06-07 15:10:08.734129 I | mvcc: store.index: compact 138
member-0 (2063048): 2024-06-07 15:10:08.734171 W | etcdserver: failed to apply request "compaction:<revision:138 > header:<ID:15517310789175053071 > " with response "" took (1.643µs) to execute, err is mvcc: required revision has been compacted
member-0 (2063048): 2024-06-07 15:10:08.737837 I | mvcc: finished scheduled compaction at 138 (took 3.684213ms)
member-0 (2063048): 2024-06-07 15:10:08.749556 I | mvcc: store.index: compact 145
member-0 (2063048): 2024-06-07 15:10:08.755487 I | mvcc: finished scheduled compaction at 145 (took 5.872415ms)
member-0 (2063048): 2024-06-07 15:10:08.785454 I | mvcc: store.index: compact 180
member-0 (2063048): 2024-06-07 15:10:08.793336 I | mvcc: finished scheduled compaction at 180 (took 7.56219ms)
member-0 (2063048): 2024-06-07 15:10:08.801507 I | mvcc: store.index: compact 183
member-0 (2063048): 2024-06-07 15:10:08.807138 I | mvcc: finished scheduled compaction at 183 (took 5.584654ms)
member-0 (2063048): 2024-06-07 15:10:08.811544 I | mvcc: store.index: compact 188
member-0 (2063048): 2024-06-07 15:10:08.817315 I | mvcc: finished scheduled compaction at 188 (took 5.727472ms)
member-0 (2063048): 2024-06-07 15:10:08.823976 I | mvcc: store.index: compact 195
member-0 (2063048): 2024-06-07 15:10:08.823985 W | etcdserver: failed to apply request "compaction:<revision:195 > header:<ID:15517310789175053189 > " with response "" took (2.235µs) to execute, err is mvcc: required revision has been compacted
member-0 (2063048): 2024-06-07 15:10:08.827999 I | mvcc: finished scheduled compaction at 195 (took 3.994016ms)
member-0 (2063048): 2024-06-07 15:10:08.907484 I | mvcc: store.index: compact 219
member-0 (2063048): 2024-06-07 15:10:08.911307 I | mvcc: finished scheduled compaction at 219 (took 3.790332ms)
member-0 (2063048): 2024-06-07 15:10:08.938127 I | mvcc: store.index: compact 220
member-0 (2063048): 2024-06-07 15:10:08.941826 I | mvcc: finished scheduled compaction at 220 (took 3.665007ms)
member-0 (2063048): 2024-06-07 15:10:09.027783 I | mvcc: store.index: compact 230
member-0 (2063048): 2024-06-07 15:10:09.031653 I | mvcc: finished scheduled compaction at 230 (took 3.827542ms)
member-0 (2063048): 2024-06-07 15:10:09.157620 I | mvcc: store.index: compact 239
member-0 (2063048): 2024-06-07 15:10:09.161540 I | mvcc: finished scheduled compaction at 239 (took 3.871184ms)
member-0 (2063048): 2024-06-07 15:10:09.232656 I | mvcc: store.index: compact 246
member-0 (2063048): 2024-06-07 15:10:09.236458 I | mvcc: finished scheduled compaction at 246 (took 3.767421ms)
member-0 (2063048): 2024-06-07 15:10:09.413328 I | mvcc: store.index: compact 256
member-0 (2063048): 2024-06-07 15:10:09.417222 I | mvcc: finished scheduled compaction at 256 (took 3.859343ms)
member-0 (2063048): 2024-06-07 15:10:09.438400 I | mvcc: store.index: compact 257
member-0 (2063048): 2024-06-07 15:10:09.442288 I | mvcc: finished scheduled compaction at 257 (took 3.864001ms)
member-0 (2063048): 2024-06-07 15:10:09.508357 I | mvcc: store.index: compact 264
member-0 (2063048): 2024-06-07 15:10:09.513601 I | mvcc: finished scheduled compaction at 264 (took 5.151462ms)
member-0 (2063048): 2024-06-07 15:10:09.657334 I | mvcc: store.index: compact 277
member-0 (2063048): 2024-06-07 15:10:09.661195 I | mvcc: finished scheduled compaction at 277 (took 3.822834ms)
member-0 (2063048): 2024-06-07 15:10:09.678618 I | mvcc: store.index: compact 280
member-0 (2063048): 2024-06-07 15:10:09.682446 I | mvcc: finished scheduled compaction at 280 (took 3.801793ms)
member-0 (2063048): 2024-06-07 15:10:09.692338 I | mvcc: store.index: compact 282
member-0 (2063048): 2024-06-07 15:10:09.696220 I | mvcc: finished scheduled compaction at 282 (took 3.859612ms)
logger.go:146: 2024-06-07T15:10:09.762+0200 INFO Finished traffic
logger.go:146: 2024-06-07T15:10:10.773+0200 INFO Reporting complete traffic {"successes": 824, "failures": 44, "successRate": 0.9493087557603687, "period": "3.319285557s", "qps": 248.24619209464416}
logger.go:146: 2024-06-07T15:10:10.773+0200 INFO Reporting traffic before failure injection {"successes": 408, "failures": 0, "successRate": 1, "period": "996.375923ms", "qps": 409.48400155189216}
logger.go:146: 2024-06-07T15:10:10.773+0200 INFO Reporting traffic during failure injection {"successes": 25, "failures": 43, "successRate": 0.36764705882352944, "period": "1.284883155s", "qps": 19.45702214455446}
logger.go:146: 2024-06-07T15:10:10.774+0200 INFO Reporting traffic after failure injection {"successes": 391, "failures": 1, "successRate": 0.9974489795918368, "period": "1.038026479s", "qps": 376.6763256142332}
logger.go:146: 2024-06-07T15:10:10.774+0200 INFO Finished simulating traffic {"max-revision": 288}
logger.go:146: 2024-06-07T15:10:10.780+0200 INFO Validating linearizable operations {"timeout": "5m0s"}
logger.go:146: 2024-06-07T15:10:10.925+0200 INFO Linearization success {"duration": "145.013099ms"}
logger.go:146: 2024-06-07T15:10:10.926+0200 INFO Validating watch
logger.go:146: 2024-06-07T15:10:10.926+0200 ERROR Broke watch guarantee {"guarantee": "resumable", "client": 2, "request": {"Key":"","Revision":220,"WithPrefix":true,"WithProgressNotify":true,"WithPrevKV":false}, "got-event": {"Type":"put-operation","Key":"key6","Value":{"Value":"235","Hash":0},"Revision":221,"IsCreate":false,"PrevValue":null}, "want-event": {"Type":"delete-operation","Key":"key7","Value":{"Value":"","Hash":0},"Revision":220,"IsCreate":false}}
validate.go:49: Failed validating watch history, err: broke Resumable - A broken watch can be resumed by establishing a new watch starting after the last revision received in a watch event before the break, so long as the revision is in the history window
logger.go:146: 2024-06-07T15:10:10.926+0200 INFO Validating serializable operations
logger.go:146: 2024-06-07T15:10:10.936+0200 INFO Saving robustness test report {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979"}
logger.go:146: 2024-06-07T15:10:10.936+0200 INFO Saving member data dir {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/server-TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
logger.go:146: 2024-06-07T15:10:10.936+0200 INFO no watch operations for client, skip persisting {"client-id": 1}
logger.go:146: 2024-06-07T15:10:10.936+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-1/operations.json"}
logger.go:146: 2024-06-07T15:10:10.937+0200 INFO Saving watch operations {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-2/watch.json"}
logger.go:146: 2024-06-07T15:10:10.937+0200 INFO no KV operations for client, skip persisting {"client-id": 2}
logger.go:146: 2024-06-07T15:10:10.938+0200 INFO no watch operations for client, skip persisting {"client-id": 3}
logger.go:146: 2024-06-07T15:10:10.938+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-3/operations.json"}
logger.go:146: 2024-06-07T15:10:10.939+0200 INFO no watch operations for client, skip persisting {"client-id": 4}
logger.go:146: 2024-06-07T15:10:10.939+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-4/operations.json"}
logger.go:146: 2024-06-07T15:10:10.940+0200 INFO no watch operations for client, skip persisting {"client-id": 5}
logger.go:146: 2024-06-07T15:10:10.940+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-5/operations.json"}
logger.go:146: 2024-06-07T15:10:10.941+0200 INFO no watch operations for client, skip persisting {"client-id": 6}
logger.go:146: 2024-06-07T15:10:10.941+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-6/operations.json"}
logger.go:146: 2024-06-07T15:10:10.943+0200 INFO no watch operations for client, skip persisting {"client-id": 7}
logger.go:146: 2024-06-07T15:10:10.943+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-7/operations.json"}
logger.go:146: 2024-06-07T15:10:10.944+0200 INFO no watch operations for client, skip persisting {"client-id": 8}
logger.go:146: 2024-06-07T15:10:10.944+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-8/operations.json"}
logger.go:146: 2024-06-07T15:10:10.945+0200 INFO no watch operations for client, skip persisting {"client-id": 9}
logger.go:146: 2024-06-07T15:10:10.945+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-9/operations.json"}
logger.go:146: 2024-06-07T15:10:10.946+0200 INFO no watch operations for client, skip persisting {"client-id": 10}
logger.go:146: 2024-06-07T15:10:10.946+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-10/operations.json"}
logger.go:146: 2024-06-07T15:10:10.947+0200 INFO Saving visualization {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/history.html"}
logger.go:146: 2024-06-07T15:10:11.016+0200 INFO killing server... {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
logger.go:146: 2024-06-07T15:10:11.016+0200 INFO stopping server... {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
logger.go:146: 2024-06-07T15:10:11.021+0200 INFO stopped server. {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
--- FAIL: TestRobustnessExploratory (5.24s)
--- FAIL: TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1 (5.23s)
Report: 1717749910517706368.tar.gz
For this one, the tombstone of key6 would be deleted by the following compact request compaction:<revision:141 >.
Not sure when the watcher setups watch request. If it setups after compactor deletes that key6, it won't get the event.
However, for the replay, it assumes that etcd process was running. There is mismatch between real events and replay.
2 178 norm header:<ID:15517310785104357973 > compaction:<revision:137 >
2 179 norm header:<ID:15517310785104357976 > txn:<success:<request_range:<key:"key1" > > success:<request_delete_range:<key:"key6" > > success:<request_put:<key:"key8" value:"121" > > success:<request_put:<key:"key9" value:"122" > > >
3 180 norm
3 181 norm header:<ID:15517310785104689154 > cluster_member_attr_set:<member_ID:14578408409545168728 member_attributes:<name:"TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0" client_urls:"http://localhost:20000" > >
3 182 norm header:<ID:15517310785104689159 > txn:<success:<request_put:<key:"key2" value:"125" > > success:<request_put:<key:"key6" value:"126" > > success:<request_put:<key:"key1" value:"127" > > success:<request_put:<key:"key0" value:"128" > > >
3 183 norm header:<ID:15517310785104689162 > put:<key:"key7" value:"129" >
3 184 norm header:<ID:15517310785104689163 > compaction:<revision:141 >
I managed to reproduce it without multi TXN, just a delete request. This extends the impact to Kubernetes.
Again the common thing is that etcd misses a watch even if it restores kvstore on revision before a delete.
Correct. :( Currently, watcher isn't able to send out all the possible deleted events.
Did the client side receive an ErrCompacted? If yes, then it should be expected behaviour.
- Since the real time events are cached, so as long as etcdserver doesn't restart, then the events won't be lost.
- But if the etcdserver restarts for whatever reason after compaction but before sending out the watch response, then the client side may miss some compacted events. But the client will definitely get the latest value against any watched key.
- As long as users don't update a key too frequent, it isn't a big problem
Did the client side receive an ErrCompacted? If yes,
No, it would be recorded in the report.
However, for the replay, it assumes that etcd process was running. There is mismatch between real events and replay.
@fuweid, don't understand the statement. Distributed system needs to uphold it's guarantees even if a node is down. Events on WAL used for replay should match what users observes. For watch client should observe either full history or or get ErrCompacted, as @ahrtr mentioned.
One thing I noticed, that the WatchChan is not broken during etcd downtime, so the time it's down is short enough that the etcd client code retries the request in a transparent way. There might be a bug there too.
Still there is a possibility of a bug in recording client responses, it is a little more complicated to record history of watch than KV requests.
https://github.com/etcd-io/etcd/blob/8a0054fecb0d52d8c7d513652c770a533a4c1037/tests/robustness/client/client.go#L268-L304
@fuweid Please feel free to ping me on slack if you want a quick discussion on this issue.
One thing I noticed, that the WatchChan is not broken during etcd downtime, so the time it's down is short enough that the etcd client code retries the request in a transparent way.
It's expected behaviour and by design as long as etcdserver doesn't return any error (e.g ErrCompacted).
Added logs to ensure that issue is not one robustness test, got:
They clearly show that revision 125 was not provided to client.
Adding new event response to watch, revs: 124...124
./bin/etcdctl (./bin/etcdctl_--endpoints=http://localhost:20000_endpoint_status_-w_json) (3237740): [{"Endpoint":"http://localhost:20000","Status":{"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":123,"raft_term":2},"version":"3.6.0-alpha.0","dbSize":20480,"leader":14578408409545168728,"raftIndex":190,"raftTerm":2,"raftAppliedIndex":189,"dbSizeInUse":16384,"storageVersion":"3.6.0"}}]
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): {"level":"info","ts":"2024-06-10T14:40:21.080206+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":120}
cluster.go:1066: members agree on a leader, members:map[14578408409545168728:0] , leader:14578408409545168728
logger.go:146: 2024-06-10T14:40:21.080+0200 INFO Setting up gofailpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
logger.go:146: 2024-06-10T14:40:21.080+0200 INFO Waiting for member to exit {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): panic: failpoint panic: {}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701):
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): goroutine 140 [running]:
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/gofail/runtime.actPanic(0x10100409d13?)
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/[email protected]/runtime/terms.go:318 +0x65
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/gofail/runtime.(*term).do(...)
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/[email protected]/runtime/terms.go:290
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/gofail/runtime.(*terms).eval(0xc0000bdce0?)
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/[email protected]/runtime/terms.go:105 +0xe3
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/gofail/runtime.(*Failpoint).Acquire(0xc000257c90?)
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/[email protected]/runtime/failpoint.go:38 +0x98
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/etcd/server/v3/etcdserver.(*raftNode).start.func1()
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/etcd/server/v3/etcdserver/raft.go:229 +0x716
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): created by go.etcd.io/etcd/server/v3/etcdserver.(*raftNode).start in goroutine 171
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/etcd/server/v3/etcdserver/raft.go:166 +0x6f
logger.go:146: 2024-06-10T14:40:21.090+0200 INFO server exited {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "code": 2}
logger.go:146: 2024-06-10T14:40:21.090+0200 INFO Member exited as expected {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
logger.go:146: 2024-06-10T14:40:21.090+0200 INFO starting server... {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
Adding new error response to watch, err: rpc error: code = Unavailable desc = error reading from server: EOF
logger.go:146: 2024-06-10T14:40:21.090+0200 INFO spawning process {"args": ["./bin/etcd", "--name=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001", "--snapshot-count=100", "--experimental-watch-progress-notify-interval=100ms", "--snapshot-count=100", "--heartbeat-interval=101", "--experimental-compaction-batch-limit=10", "--election-timeout=521", "--experimental-snapshot-catchup-entries=100", "--initial-cluster-token=new", "--initial-cluster=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory2003573077/001", "name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/[email protected]/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.100443+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_VERIFY=all"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.100561+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_UNSUPPORTED_ARCH=amd64"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.100609+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.100628+0200","caller":"etcdmain/etcd.go:63","msg":"Running: ","args":["./bin/etcd","--name=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0","--listen-client-urls=http://localhost:20000","--advertise-client-urls=http://localhost:20000","--listen-peer-urls=http://localhost:20001","--initial-advertise-peer-urls=http://localhost:20001","--initial-cluster-token=new","--data-dir","/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001","--snapshot-count=100","--experimental-watch-progress-notify-interval=100ms","--snapshot-count=100","--heartbeat-interval=101","--experimental-compaction-batch-limit=10","--election-timeout=521","--experimental-snapshot-catchup-entries=100","--initial-cluster-token=new","--initial-cluster=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0=http://localhost:20001","--initial-cluster-state=new"]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.100701+0200","caller":"etcdmain/etcd.go:106","msg":"server has already been initialized","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001","dir-type":"member"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.100719+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.100733+0200","caller":"embed/etcd.go:125","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:20001"]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.101260+0200","caller":"embed/etcd.go:133","msg":"configuring client listeners","listen-client-urls":["http://localhost:20000"]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.101669+0200","caller":"embed/etcd.go:314","msg":"starting an etcd server","etcd-version":"3.6.0-alpha.0","git-sha":"8a0054fec","go-version":"go1.22.4","go-os":"linux","go-arch":"amd64","max-cpu-set":24,"max-cpu-available":24,"member-initialized":true,"name":"TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001","wal-dir":"","wal-dir-dedicated":"","member-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member","force-new-cluster":false,"heartbeat-interval":"101ms","election-timeout":"505ms","initial-election-tick-advance":true,"snapshot-count":100,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":100,"initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[],"experimental-set-member-localaddr":false,"experimental-local-address":"","cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.102192+0200","logger":"bbolt","caller":"backend/backend.go:197","msg":"Opening db file (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap/db) with mode 2d72772d2d2d2d2d2d2d and with options: {Timeout: 0s, NoGrowSync: false, NoFreelistSync: true, PreLoadFreelist: false, FreelistType: hashmap, ReadOnly: false, MmapFlags: 8000, InitialMmapSize: 10737418240, PageSize: 0, NoSync: false, OpenFile: 0x0, Mlock: false, Logger: 0xc0000924a0}"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.109701+0200","logger":"bbolt","caller":"[email protected]/db.go:321","msg":"Opening bbolt db (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap/db) successfully"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.109759+0200","caller":"storage/backend.go:80","msg":"opened backend db","path":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap/db","took":"7.653391ms"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.110518+0200","caller":"etcdserver/bootstrap.go:412","msg":"recovered v2 store from snapshot","snapshot-index":102,"snapshot-size":"7.2 kB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.110534+0200","caller":"storage/backend.go:108","msg":"Skipping snapshot backend","consistent-index":191,"snapshot-index":102}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.110547+0200","caller":"etcdserver/bootstrap.go:231","msg":"recovered v3 backend","backend-size-bytes":20480,"backend-size":"20 kB","backend-size-in-use-bytes":12288,"backend-size-in-use":"12 kB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111143+0200","caller":"etcdserver/bootstrap.go:579","msg":"restarting local member","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","commit-index":191}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111158+0200","caller":"etcdserver/bootstrap.go:93","msg":"bootstrapping cluster"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111260+0200","caller":"etcdserver/bootstrap.go:100","msg":"bootstrapping storage"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111357+0200","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.6"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111368+0200","caller":"membership/cluster.go:291","msg":"recovered/added member from store","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","recovered-remote-peer-id":"ca50e9357181d758","recovered-remote-peer-urls":["http://localhost:20001"],"recovered-remote-peer-is-learner":false}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111377+0200","caller":"membership/cluster.go:301","msg":"set cluster version from store","cluster-version":"3.6"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111391+0200","caller":"etcdserver/bootstrap.go:108","msg":"bootstrapping raft"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111403+0200","caller":"etcdserver/server.go:306","msg":"bootstrap successfully"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111447+0200","logger":"raft","caller":"[email protected]/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=(14578408409545168728)"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111480+0200","logger":"raft","caller":"[email protected]/raft.go:870","msg":"ca50e9357181d758 became follower at term 2"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111494+0200","logger":"raft","caller":"[email protected]/raft.go:483","msg":"newRaft ca50e9357181d758 [peers: [ca50e9357181d758], term: 2, commit: 191, applied: 102, lastindex: 192, lastterm: 2]"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.115179+0200","caller":"auth/store.go:1134","msg":"simple token is not cryptographically signed"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.118827+0200","caller":"mvcc/kvstore.go:330","msg":"restored last compact revision","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":113}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.119624+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":124}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.123574+0200","caller":"mvcc/kvstore.go:413","msg":"resume scheduled compaction","scheduled-compact-revision":120}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.123661+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":120}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141549+0200","caller":"storage/quota.go:93","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141594+0200","caller":"etcdserver/server.go:576","msg":"starting etcd server","local-member-id":"ca50e9357181d758","local-server-version":"3.6.0-alpha.0","cluster-id":"34f27e83b3bc2ff","cluster-version":"3.6"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141643+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":120,"took":"17.934179ms","hash":1204195312,"current-db-size-bytes":20480,"current-db-size":"20 kB","current-db-size-in-use-bytes":16384,"current-db-size-in-use":"16 kB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141667+0200","caller":"mvcc/kvstore.go:248","msg":"previous compaction was interrupted, skip storing compaction hash value"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141660+0200","caller":"etcdserver/server.go:470","msg":"started as single-node; fast-forwarding election ticks","local-member-id":"ca50e9357181d758","forward-ticks":4,"forward-duration":"404ms","election-ticks":5,"election-timeout":"505ms"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141708+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap","suffix":"snap.db","max":5,"interval":"30s"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141769+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap","suffix":"snap","max":5,"interval":"30s"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141786+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/wal","suffix":"wal","max":5,"interval":"30s"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141776+0200","caller":"embed/etcd.go:610","msg":"serving peer traffic","address":"127.0.0.1:20001"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141804+0200","caller":"embed/etcd.go:283","msg":"now serving peer/client/metrics","local-member-id":"ca50e9357181d758","initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141807+0200","caller":"embed/etcd.go:582","msg":"cmux::serve","address":"127.0.0.1:20001"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.616829+0200","logger":"raft","caller":"[email protected]/raft.go:956","msg":"ca50e9357181d758 is starting a new election at term 2"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.617106+0200","logger":"raft","caller":"[email protected]/raft.go:899","msg":"ca50e9357181d758 became pre-candidate at term 2"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.617166+0200","logger":"raft","caller":"[email protected]/raft.go:1043","msg":"ca50e9357181d758 received MsgPreVoteResp from ca50e9357181d758 at term 2"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.617184+0200","logger":"raft","caller":"[email protected]/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgPreVoteResp votes and 0 vote rejections"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.617211+0200","logger":"raft","caller":"[email protected]/raft.go:883","msg":"ca50e9357181d758 became candidate at term 3"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.623853+0200","logger":"raft","caller":"[email protected]/raft.go:1043","msg":"ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 3"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.623883+0200","logger":"raft","caller":"[email protected]/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgVoteResp votes and 0 vote rejections"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.623904+0200","logger":"raft","caller":"[email protected]/raft.go:938","msg":"ca50e9357181d758 became leader at term 3"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.623916+0200","logger":"raft","caller":"[email protected]/node.go:370","msg":"raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 3"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628269+0200","caller":"embed/serve.go:104","msg":"ready to serve client requests"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628273+0200","caller":"etcdserver/server.go:1774","msg":"published local member to cluster through raft","local-member-id":"ca50e9357181d758","local-member-attributes":"{Name:TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0 ClientURLs:[http://localhost:20000]}","cluster-id":"34f27e83b3bc2ff","publish-timeout":"6.01s"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628459+0200","caller":"etcdmain/main.go:44","msg":"notifying init daemon"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628566+0200","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628867+0200","caller":"v3rpc/health.go:62","msg":"grpc service status changed","service":"","status":"SERVING"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.630495+0200","caller":"embed/serve.go:188","msg":"serving client traffic insecurely; this is strongly discouraged!","traffic":"grpc+http","address":"127.0.0.1:20000"}
logger.go:146: 2024-06-10T14:40:21.633+0200 INFO started server. {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "pid": 3237749}
logger.go:146: 2024-06-10T14:40:21.635+0200 INFO Finished triggering failpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
Adding response watch: header:<cluster_id:238453183653593855 member_id:14578408409545168728 revision:125 raft_term:3 > created:true
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:22.114627+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":125}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:22.131637+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":125,"took":"16.956782ms","hash":981787830,"current-db-size-bytes":20480,"current-db-size":"20 kB","current-db-size-in-use-bytes":16384,"current-db-size-in-use":"16 kB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:22.131665+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":981787830,"revision":125,"compact-revision":120}
Adding new event response to watch, revs: 126...132
Added logs to ensure that issue is not one robustness test, got:
Where did you add the log? Also suggest to provide a clear summary on the exact steps you did.
Code https://github.com/etcd-io/etcd/pull/18145
TL; DR
Delete events may be missing in watch response. This issue can only be reproduced when the compact revision happens to be a delete operation.
How to reproduce this issue?
- Start an one member etcd cluster in one terminal
- Execute commands below in another terminal
$ etcdctl put k1 v2 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":2,"raft_term":2}}
$ etcdctl put k1 v3 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":3,"raft_term":2}}
$ etcdctl put k1 v4 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":4,"raft_term":2}}
$ etcdctl del k1 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":5,"raft_term":2},"deleted":1}
$ etcdctl put k1 v6 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":6,"raft_term":2}}
- Compact revision 5
$ etcdctl compact 5
compacted revision 5
- Watch starting from rev 5
$ etcdctl watch k1 --rev 5
PUT
k1
v6
Obviously the deletion operation with revision 5 is missing.
Alternatively, If you want to try the robustness test, then run command below on https://github.com/etcd-io/etcd/pull/18145,
$ go test -run TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1 -v --count 100 --timeout 1h -failfast
Expected behaviour
As mentioned in https://github.com/etcd-io/etcd/issues/18089#issuecomment-2156720357, some revisions may be compacted, and it may impact the watch clients. But it shouldn't never drop any events quietly.
Expected behaviour:
- Either the client side receives all events: no any events are dropped
- or the client side gets an
ErrCompactedresponse.
Root cause
Usually etcd keeps the compact revision (see code below). For example, when you compact revision 5, actually etcd only compacts revisions <=4.
https://github.com/etcd-io/etcd/blob/8a0054fecb0d52d8c7d513652c770a533a4c1037/server/storage/mvcc/key_index.go#L256
But if the compact revision is a delete operation, then etcd doesn't keep it (in other words, etcd will compact/remove it) (see code below),
https://github.com/etcd-io/etcd/blob/8a0054fecb0d52d8c7d513652c770a533a4c1037/server/storage/mvcc/key_index.go#L244-L247
But the problem is that etcdserver doesn't return ErrCompacted if the startWatchRevision happens to be the last compact revision. Because the if condition is if w.minRev < compactRev instead of if w.minRev <= compactRev.
https://github.com/etcd-io/etcd/blob/8a0054fecb0d52d8c7d513652c770a533a4c1037/server/storage/mvcc/watcher_group.go#L253
Note this isn't a regression. It's an old issue.
Potential solution
It's open to discussion.
When executing compaction operation, we should only keep the latest revision for each key. For example, if the latest revision for "k1" is 10, and user compacts 10, then we should keep 10; otherwise if user compacts any revision <=9, we just compact it, there is no point to keep it. Note that different keys may have different latest revision.
The available is the revisions to keep,
https://github.com/etcd-io/etcd/blob/8a0054fecb0d52d8c7d513652c770a533a4c1037/server/storage/mvcc/key_index.go#L208
We also need to change the if condition to if w.minRev <= compactRev,
https://github.com/etcd-io/etcd/blob/8a0054fecb0d52d8c7d513652c770a533a4c1037/server/storage/mvcc/watcher_group.go#L253
So the watch client will get an ErrCompacted response.
Impact
All versions (including 3.4, 3.5 and main) are impacted.
Major, but it might not be easy to reproduce.
- As mentioned above, it can only be reproduced when the compact revision happens to be a delete operation.
- Also usually compaction operation isn't that frequent in production environment. Normally the watch delay won't be greater than compaction interval.
Immediate action
Create a simple e2e test to reliably reproduce this issue. Please refer to the section "How to reproduce this issue?" above.
Brainstorm the solution.
don't understand the statement. Distributed system needs to uphold it's guarantees even if a node is down. Events on WAL used for replay should match what users observes. For watch client should observe either full history or or get ErrCompacted, as @ahrtr mentioned.
Hi @serathius sorry for late reply. In this case 1717749910517706368.tar.gz, both key8 and key9's latest revision is 141. It won't be compacted. But the revision 141 for key6 is tombstone. It could be deleted before sending delete-operation event to client.(Maybe we should log watch-operations in robustness).
IMO, the robustness replay assumes that the delete-operation event is just sent out after applying to db. Even if the compact request is following that delete request, the robustness replay always assumes that the client should receive that delete-operation event.
However, in real world, delete-operation event depends on compactor which is handled in background and we can't control in replay. That's why I said that watch can't guarantee the client can see all the events in current design.
2 178 norm header:<ID:15517310785104357973 > compaction:<revision:137 >
2 179 norm header:<ID:15517310785104357976 > txn:<success:<request_range:<key:"key1" > > success:<request_delete_range:<key:"key6" > > success:<request_put:<key:"key8" value:"121" > > success:<request_put:<key:"key9" value:"122" > > >
3 180 norm
3 181 norm header:<ID:15517310785104689154 > cluster_member_attr_set:<member_ID:14578408409545168728 member_attributes:<name:"TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0" client_urls:"http://localhost:20000" > >
3 182 norm header:<ID:15517310785104689159 > txn:<success:<request_put:<key:"key2" value:"125" > > success:<request_put:<key:"key6" value:"126" > > success:<request_put:<key:"key1" value:"127" > > success:<request_put:<key:"key0" value:"128" > > >
3 183 norm header:<ID:15517310785104689162 > put:<key:"key7" value:"129" >
3 184 norm header:<ID:15517310785104689163 > compaction:<revision:141 >
Please feel free to ping me on slack if you want a quick discussion on this issue.
@ahrtr will ping you tomorrow. Thanks!
As mentioned above, it can only be reproduced when the compact revision happens to be a delete operation. and the delete operation is the last operation for the key. If there is any operations following the delete operation on the key (i.e. adding the key back), then the issue can't be reproduced.
This should be pretty common, delete operations are pretty frequent, for example pods are immutable resource, meaning they are frequently created and then deleted and never resurrected.
Also usually compaction operation isn't that frequent in production environment.
It's once every 5 minutes, ask anyone if they want to build a distributed system that makes a mistake every 5 minutes.
I think that the only saving grace is that this requires a slow watch that is reading events from before compaction window. This should be pretty rare for watch to be 5 minutes behind, however recently I have seen case where lock contention on watch cache caused it to be behind even 2 hours.
@ahrtr confirmed the bug being present on both release-3.4 and release-3.5 branches
Can someone take a look if the bug was introduced recently (introduced in one of the patch versions) or is an old one (tracing back to before v3.4)?
It's important to know if this is a regression in some newer version, or it was always there. If this is a new bug we might need to consider an announcement.
Can someone take a look if the bug was introduced recently (introduced in one of the patch versions) or is an old one (tracing back to before v3.4)?
Obviously this isn't a regression. It's an old issue. Also updated my above comment.
This should be pretty common, delete operations are pretty frequent, for example pods are immutable resource, meaning they are frequently created and then deleted and never resurrected.
Pls see updated section "Impact" above. added and the delete operation is the last operation for the key
It's once every 5 minutes, ask anyone if they want to build a distributed system that makes a mistake every 5 minutes.
I think that the only saving grace is that this requires a slow watch that is reading events from before compaction window.
- 5 minutes isn't frequent as compare to watch delay.
- If you read the reproducing steps in my above comment, there is also very low possibility to see this issue on first watch response.
Can someone take a look if the bug was introduced recently (introduced in one of the patch versions) or is an old one (tracing back to before v3.4)?
Following the clear manual etcdctl reproduce steps provided by @ahrtr I reproduced the issue on v3.4.0 so it looks like this has been present for at least ~5 years.
Thanks, @jmhbnz, for confirming this on v3.4.0!
I appreciate everyone's insights so far. To clarify my earlier question, it's sometimes valuable to confirm even seemingly obvious details, just to ensure we're all on the same page and can make the most informed decisions going forward.
Now that we know this isn't a recent regression, we can focus on understanding the underlying cause and potential solutions.
Just had a quick talk with @serathius , we need to clearly define the behaviour of compaction, and also the impact on watch, Please anyone feel free to comment in https://docs.google.com/document/d/1APJE38DxENsRLLVapRz1CQ6UD4-uYaFutO12Y01VcNw/edit
Eventually we need to document the behaviour in etcd's official document
Can someone take a look if the bug was introduced recently (introduced in one of the patch versions) or is an old one (tracing back to before v3.4)?
Following the clear manual
etcdctlreproduce steps provided by @ahrtr I reproduced the issue onv3.4.0so it looks like this has been present for at least ~5 years.
+1 on successfully reproducing the issue
Actions
- Create a simple e2e test to reproduce this issue. Refer to the section "How to reproduce this issue" in https://github.com/etcd-io/etcd/issues/18089#issuecomment-2158891301;
- Update the implementation to ensure it aligns with doc
- Please feel free to comment on the doc
- Update etcd's official document to clarify the compaction behavior