testsuite icon indicating copy to clipboard operation
testsuite copied to clipboard

[BUG] Operator Framework gives hash key "kind" error when running on latest Ubuntu client

Open agentpoyo opened this issue 2 years ago • 9 comments

Describe the bug Operator Framework gives hash key "kind" error when running on latest Ubuntu client

To Reproduce Setup CNF-Testsuite on latest Ubuntu 22.04 (this might be occurring on other OSes as well)

Expected behavior This spec test to verify operator framework should not fail with crystal hash errors

Error

$ crystal spec --tag operator_test
current_branch during compile: "main"
current_tag during compile: "v0.41.0"
Note: switching to 'tags/v0.22.0'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at e08415d12 (e2e-fix) do not check for updated csv that is likely to be GC'd (#2837)
E.

Failures:

  1) Operator 'operator_test' test if operator is being used

       Missing hash key: "kind" (KeyError)
         from /home/pair/.crenv/versions/1.6.0/share/crystal/src/hash.cr:1077:11 in '[]'
         from /home/pair/.crenv/versions/1.6.0/share/crystal/src/json/any.cr:103:7 in '[]'
         from lib/kubectl_client/kubectl_client.cr:590:32 in 'pods_by_resource'
         from spec/workload/operator_spec.cr:34:308 in '->'
         from /home/pair/.crenv/versions/1.6.0/share/crystal/src/spec/example.cr:45:13 in 'internal_run'
         from /home/pair/.crenv/versions/1.6.0/share/crystal/src/spec/example.cr:33:16 in 'run'
         from /home/pair/.crenv/versions/1.6.0/share/crystal/src/spec/context.cr:18:23 in 'internal_run'
         from /home/pair/.crenv/versions/1.6.0/share/crystal/src/spec/context.cr:339:7 in 'run'
         from /home/pair/.crenv/versions/1.6.0/share/crystal/src/spec/context.cr:18:23 in 'internal_run'
         from /home/pair/.crenv/versions/1.6.0/share/crystal/src/spec/context.cr:156:7 in 'run'
         from /home/pair/.crenv/versions/1.6.0/share/crystal/src/spec/dsl.cr:220:7 in '->'
         from /home/pair/.crenv/versions/1.6.0/share/crystal/src/crystal/at_exit_handlers.cr:14:19 in 'run'
         from /home/pair/.crenv/versions/1.6.0/share/crystal/src/crystal/main.cr:50:14 in 'exit'
         from /home/pair/.crenv/versions/1.6.0/share/crystal/src/crystal/main.cr:45:5 in 'main'
         from /home/pair/.crenv/versions/1.6.0/share/crystal/src/crystal/main.cr:127:3 in 'main'
         from /lib/x86_64-linux-gnu/libc.so.6 in '??'
         from /lib/x86_64-linux-gnu/libc.so.6 in '__libc_start_main'
         from /home/pair/.cache/crystal/crystal-run-spec.tmp in '_start'
         from ???
       

Finished in 16:53 minutes
2 examples, 0 failures, 1 errors, 0 pending

Failed examples:

crystal spec spec/workload/operator_spec.cr:13 # Operator 'operator_test' test if operator is being used

Device (please complete the following information):

  • OS Linux
  • Distro Ubuntu
  • Version 22.04
  • Architecture x86
  • Crystal 1.6.0
  • cnf-testsuite version v0.41.0

agentpoyo avatar Feb 28 '23 19:02 agentpoyo

This is also happening running the new sig_term_handled test.

agentpoyo avatar Mar 04 '23 23:03 agentpoyo

May need some help to reproduce this. The test runs as expected on Ubuntu 22.04.

CleanShot 2023-03-05 at 12 07 56@2x

I'll try running it a few times to see if this test fails randomly.

HashNuke avatar Mar 05 '23 05:03 HashNuke

@agentpoyo Was able to reproduce the bug with the sig_term_handled test.

CleanShot 2023-03-05 at 14 32 34@2x

HashNuke avatar Mar 05 '23 07:03 HashNuke

The issue for sig_term_handled test was the sample CNF that was being used. It was the CNF used in the specs for the test.

PodDisruptionBudget version was policy/v1beta earlier. Since K8s 1.25, this is now policy/v1. I updated the sample CNF and pushed a change for GitHub Actions to run. That will confirm my assumption if the specs work as expected.

HashNuke avatar Mar 08 '23 17:03 HashNuke

Confirmed. The issue is actually about the k8s version. It won't be caught on GitHub Actions.

  • On cnfdev4, kind creates k8s v1.25 clusters.
  • kind on GitHub Actions creates k8s clusters with v1.24. Very likely that this was the same on cnfdev3 too.

PodDisruptionBudget policy/v1beta was removed from k8s 1.25. So sample CNFs that use this have to be updated. I'll create a ticket for this.

This might be the reason why this issue isn't caught on GitHub Actions.

HashNuke avatar Mar 09 '23 10:03 HashNuke

@HashNuke is this issue still relevant?

lixuna avatar Feb 21 '24 22:02 lixuna

@lixuna Yes. This issue is still relevant.

HashNuke avatar Feb 22 '24 09:02 HashNuke

I just re-read the details and checked the attached commit on this ticket.

There's a branch bug/1753 with some unmerged changes but that requires other work updating kind on github actions. We can let any of the contributors help resolve this since one of them is already working on kind-related issues.

I was confused when I saw Denver's commit on this ticket and thought this was fixed. The commit is attributed to Denver, but that was actually me committing and pushing on the pair machine.

HashNuke avatar Feb 22 '24 09:02 HashNuke

Here is a summary of what is happening

  • GitHub Actions is using old version of kind (1.23), which comes with an old version of k8s.
  • One of our sample CNFs is using a beta version of PodDisruptionBudget (see attached screenshot that makes note of the change required).

Here is why this is a problem

  • When developers or contributors work on developing the testsuite, the k8s version is usually newer (1.25 and above) and this beta version of the resource PodDisruptionBudget has been removed, in favour of a new stable version. The issue described in this ticket would occur.
  • GitHub Actions runs kind 1.23, if we make the upgrade now, a failure would occur on GitHub actions.

CleanShot 2024-02-22 at 17 00 52@2x

Here is what is required

The following changes have to be done in the one PR.

  • Upgrade to a version of kind k8s image that causes least disruption for GitHub Actions. (As I remember, newer kind k8s images are based on distroless and that opens up other issues).
  • Update the sample CNF to use the stable version of PodDisruptionBudget.

I've added a contributions-welcome tag to this ticket so that anyone interested can pick this up to lend a hand.

HashNuke avatar Feb 22 '24 10:02 HashNuke