No warning of use of incompatible images via env vars on skupper init
Describe the bug
Skupper allows the images to be used by skupper init to be defined with environment variables such as QDROUTERD_IMAGE or SKUPPER_SERVICE_CONTROLLER_IMAGE.
However, it does not do any check on whether the images are at all compatible with the CLI version: the site is created, but it may fail on future operations.
How To Reproduce
- Install skupper 1.4 CLI
- Set the image variables to point to 1.5 (see below)
- Run
skupper initon two sites - Link them with
token createandlink create - Inspect the sites with the various
statuscommands, using both the 1.5 and 1.4 CLIs
The result is that most status commands using the 1.5 CLI fail with an error like Error: configmaps "skupper-network-status" not found, while the 1.4 CLI works most of the time.
The created link, however, will be failing with an error like below:
Link link1 not connected (Failed to redeem claim: Post "https://claims-dh-cross.apps.x.y.z.test.com:443/a8e18135-6572-11ef-8887-3e990af28d85?site-version=1.4.3": EOF)
skupper version works with the 1.5 CLI, but then it will list all versions as 1.5, masking the problem.
On the Red Hat build, it is possible to detect that the site has been created with inconsistent versions by looking first at the images being used, and then looking at the deployment labels, where things like rht.comp_ver: 1.4.4 will be present. For upstream, I only found references to the CLI version on the skupper-internal config map
export QDROUTERD_IMAGE=quay.io/skupper/skupper-router:2.5.1
export SKUPPER_SERVICE_CONTROLLER_IMAGE=quay.io/skupper/service-controller:1.5.3
export SKUPPER_CONTROLLER_PODMAN_IMAGE=quay.io/skupper/controller-podman:1.5.3
export SKUPPER_CONFIG_SYNC_IMAGE=quay.io/skupper/config-sync:1.5.3
export SKUPPER_FLOW_COLLECTOR_IMAGE=quay.io/skupper/flow-collector:1.5.3
export SKUPPER_SITE_CONTROLLER_IMAGE=quay.io/skupper/site-controller:1.5.3
Expected behavior
I'm not sure what to expect. Perhaps a message stating that the images being used are incompatible with the CLI, or some documentation about this possibility.
Use of these environment variables is very useful in testing, so the CLI should probably not fail on detecting this situation (ie, it should still create the site, but warn of the situation somehow).
Outside of testing, this scenario is probably rare to happen 'in the wild', but I thought it interesting to record it, as it's fairly difficult to figure out what's happening when it does happen.
Environment details
- Skupper CLI: 1.5.3 and 1.4.4]
- Skupper Operator (if applicable): N/A
- Platform: Any kubernetes (tested against OpenShift)
Additional context
I've run a full Integration test using this scenario, and only 11 of 103 tests failed. Not all of them use the CLI, but the test code base was still 1.4 pointing to images in 1.5. That indicates that most operations may still work, where some fail, which makes this scenario more difficult to debug.
@pwright, this is the one we talked about earlier today.