[BUG] Operator Framework gives hash key "kind" error when running on latest Ubuntu client
Describe the bug Operator Framework gives hash key "kind" error when running on latest Ubuntu client
To Reproduce Setup CNF-Testsuite on latest Ubuntu 22.04 (this might be occurring on other OSes as well)
Expected behavior This spec test to verify operator framework should not fail with crystal hash errors
Error
$ crystal spec --tag operator_test
current_branch during compile: "main"
current_tag during compile: "v0.41.0"
Note: switching to 'tags/v0.22.0'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at e08415d12 (e2e-fix) do not check for updated csv that is likely to be GC'd (#2837)
E.
Failures:
1) Operator 'operator_test' test if operator is being used
Missing hash key: "kind" (KeyError)
from /home/pair/.crenv/versions/1.6.0/share/crystal/src/hash.cr:1077:11 in '[]'
from /home/pair/.crenv/versions/1.6.0/share/crystal/src/json/any.cr:103:7 in '[]'
from lib/kubectl_client/kubectl_client.cr:590:32 in 'pods_by_resource'
from spec/workload/operator_spec.cr:34:308 in '->'
from /home/pair/.crenv/versions/1.6.0/share/crystal/src/spec/example.cr:45:13 in 'internal_run'
from /home/pair/.crenv/versions/1.6.0/share/crystal/src/spec/example.cr:33:16 in 'run'
from /home/pair/.crenv/versions/1.6.0/share/crystal/src/spec/context.cr:18:23 in 'internal_run'
from /home/pair/.crenv/versions/1.6.0/share/crystal/src/spec/context.cr:339:7 in 'run'
from /home/pair/.crenv/versions/1.6.0/share/crystal/src/spec/context.cr:18:23 in 'internal_run'
from /home/pair/.crenv/versions/1.6.0/share/crystal/src/spec/context.cr:156:7 in 'run'
from /home/pair/.crenv/versions/1.6.0/share/crystal/src/spec/dsl.cr:220:7 in '->'
from /home/pair/.crenv/versions/1.6.0/share/crystal/src/crystal/at_exit_handlers.cr:14:19 in 'run'
from /home/pair/.crenv/versions/1.6.0/share/crystal/src/crystal/main.cr:50:14 in 'exit'
from /home/pair/.crenv/versions/1.6.0/share/crystal/src/crystal/main.cr:45:5 in 'main'
from /home/pair/.crenv/versions/1.6.0/share/crystal/src/crystal/main.cr:127:3 in 'main'
from /lib/x86_64-linux-gnu/libc.so.6 in '??'
from /lib/x86_64-linux-gnu/libc.so.6 in '__libc_start_main'
from /home/pair/.cache/crystal/crystal-run-spec.tmp in '_start'
from ???
Finished in 16:53 minutes
2 examples, 0 failures, 1 errors, 0 pending
Failed examples:
crystal spec spec/workload/operator_spec.cr:13 # Operator 'operator_test' test if operator is being used
Device (please complete the following information):
- OS Linux
- Distro Ubuntu
- Version 22.04
- Architecture x86
- Crystal 1.6.0
- cnf-testsuite version v0.41.0
This is also happening running the new sig_term_handled test.
May need some help to reproduce this. The test runs as expected on Ubuntu 22.04.
I'll try running it a few times to see if this test fails randomly.
@agentpoyo Was able to reproduce the bug with the sig_term_handled test.
The issue for sig_term_handled test was the sample CNF that was being used. It was the CNF used in the specs for the test.
PodDisruptionBudget version was policy/v1beta earlier. Since K8s 1.25, this is now policy/v1. I updated the sample CNF and pushed a change for GitHub Actions to run. That will confirm my assumption if the specs work as expected.
Confirmed. The issue is actually about the k8s version. It won't be caught on GitHub Actions.
- On cnfdev4, kind creates k8s v1.25 clusters.
- kind on GitHub Actions creates k8s clusters with v1.24. Very likely that this was the same on cnfdev3 too.
PodDisruptionBudget policy/v1beta was removed from k8s 1.25. So sample CNFs that use this have to be updated. I'll create a ticket for this.
This might be the reason why this issue isn't caught on GitHub Actions.
@HashNuke is this issue still relevant?
@lixuna Yes. This issue is still relevant.
I just re-read the details and checked the attached commit on this ticket.
There's a branch bug/1753 with some unmerged changes but that requires other work updating kind on github actions. We can let any of the contributors help resolve this since one of them is already working on kind-related issues.
I was confused when I saw Denver's commit on this ticket and thought this was fixed. The commit is attributed to Denver, but that was actually me committing and pushing on the pair machine.
Here is a summary of what is happening
- GitHub Actions is using old version of kind (1.23), which comes with an old version of k8s.
- One of our sample CNFs is using a beta version of
PodDisruptionBudget(see attached screenshot that makes note of the change required).
Here is why this is a problem
- When developers or contributors work on developing the testsuite, the k8s version is usually newer (1.25 and above) and this beta version of the resource PodDisruptionBudget has been removed, in favour of a new stable version. The issue described in this ticket would occur.
- GitHub Actions runs kind 1.23, if we make the upgrade now, a failure would occur on GitHub actions.
Here is what is required
The following changes have to be done in the one PR.
- Upgrade to a version of kind k8s image that causes least disruption for GitHub Actions. (As I remember, newer kind k8s images are based on distroless and that opens up other issues).
- Update the sample CNF to use the stable version of
PodDisruptionBudget.
I've added a contributions-welcome tag to this ticket so that anyone interested can pick this up to lend a hand.