vitess icon indicating copy to clipboard operation
vitess copied to clipboard

Bug Report: local_example is only testing etcd topo in matrix build

Open morgo opened this issue 6 months ago • 1 comments

Overview of the Issue

The GitHub actions workflow does this: https://github.com/vitessio/vitess/blob/42317c0ecf025c47fc2bcdc2eaf5267b93ca8d00/.github/workflows/local_example.yml#L89-L93

But the export on TOPO is not propagated to the container. You can see this by looking in any of the recent actions @ https://github.com/vitessio/vitess/actions?query=workflow%3Alocal_example

Expand "local_example" on one for zk/consul, and you'll see:

++ ETCD_SERVER=localhost:2379
++ TOPOLOGY_FLAGS='--topo_implementation etcd2 --topo_global_server_address localhost:2379 --topo_global_root /vitess/global'

Reproduction Steps

Observed in CI, not user-facing.

Binary Version

affects main

Operating System and Environment details

n/a

Log Fragments

++ alias 'vtctl=vtctl --config-file-not-found-handling=ignore'
++ '[' '' = zk2 ']'
++ '[' '' = consul ']'
++ ETCD_SERVER=localhost:2379
++ TOPOLOGY_FLAGS='--topo_implementation etcd2 --topo_global_server_address localhost:2379 --topo_global_root /vitess/global'
++ mkdir -p /vt/vtdataroot/etcd
++ mkdir -p /vt/vtdataroot/tmp

morgo avatar Jun 13 '25 16:06 morgo

It looks like consul is actually broken. It looks like it has been deprecated for a while, so I am going to try removing it from the local_example instead.

morgo avatar Jun 13 '25 18:06 morgo

This is the error when running consul:

export TOPO=consul
go run test.go -print-log -follow -keep-data -retry=1 local_example
E0707 13:58:47.022187   14150 main.go:60] rpc error: code = Unknown desc = RebuildSrvVSchema([]) = GetKnownCells failed: node doesn't exist: vitess/global/cells/
ERROR: Failed to create and configure the commerce keyspace
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
2025/07/07 07:58:48 mysql80.local_example: saving test output to _test/20250707-075519.25991/mysql80.local_example-1.1.log
2025/07/07 07:58:48 mysql80.local_example: FAILED (try 1/1) in 2m40.8s: exit status 1
2025/07/07 07:58:48 mysql80.local_example: retry limit exceeded
2025/07/07 07:58:48 Removing temp dir /var/folders/hm/g9jyxbtj40z2bh9gmj5ktdqh0000gn/T/vt_1821828988
2025/07/07 07:58:54 ============================================================
2025/07/07 07:58:54 mysql80.local_example                   	FAIL (1 tries)
2025/07/07 07:58:54 ============================================================
2025/07/07 07:58:54 0 PASSED, 0 FLAKY, 1 FAILED, 0 SKIPPED
2025/07/07 07:58:54 Total time: 3m34.2s

I tried manually stepping through it, and the consul agent process is running / running consul-up.sh itself worked fine.

morgo avatar Jul 07 '25 14:07 morgo

@morgo I recently added a bug fix that needed consul to run. I was able to add the test that passes in CI https://github.com/vitessio/vitess/pull/18434/files#diff-b521b09bb3810a48fe1c197b7468f5b40efc4e65b3622d434a94901392a2b918

Looks like these are two different ways of running consul.

harshit-gangal avatar Jul 16 '25 15:07 harshit-gangal