manifests icon indicating copy to clipboard operation
manifests copied to clipboard

Refactor test workflows

Open lampajr opened this issue 10 months ago • 4 comments

Description of your changes:

Upgrade actions/checkout to version v4. This is going to be required as v3 does not run on node20 which will be required in a while.

[edit] As discussed, this PR also contains some refactoring on the gh workflows, e.g., renaming and dependencies

Checklist:

  • [x] Unit tests pass: Make sure you have installed kustomize == 5.2.1+
    1. make generate-changed-only
    2. make test

lampajr avatar Apr 29 '24 15:04 lampajr

Here the link to the Github statement https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/

lampajr avatar Apr 29 '24 15:04 lampajr

  • consistent nameing e.g. vwa_kind_test.yaml -> volumes-web-application.yaml
  • nb_controller_m2m_kind_test.yaml -> notebook_controller.yaml
  • consistent and complete paths for triggering the workflow
  • merging similar tests for the same component. E.g. the m2m tests are usually newer and better.

For a second PR:

I also want to get rid of /go.mod /go.sum /.flake8 and /prow_config if they are not needed somewhere. /docs also looks crazy outdated. KustomizeBestPractices.md, ObsoleteKfctl.md, TestFramework.md and dex-auth might also be obsolete.

juliusvonkohout avatar May 02 '24 15:05 juliusvonkohout

Hi @juliusvonkohout, I think I addressed most of the things we discussed in today meeting:

  • Renamed workflows with a more meaningful convention, I preferred to keep _test in the name to differentiate any other gha that is not test releated (e.g., triage_issue or any other in future)
  • Fixed file dependencies that will trigger those workflows, I hope I did not miss anything but in the worst case they can be added as soon as they are noticed
  • Upgrade actions/checkout action to v4

The only thing I did not do in this PR is the workflow merge as I am not completely sure if two workflows can be really merged together, e.g., kserve and kserve_m2m.

Build & Apply BentoML Yatai Stack manifests in KinD / build (pull_request) Looks like it is failing but I don't think it is related to this change.

lampajr avatar May 02 '24 20:05 lampajr

@lampajr can you fix the bentoml tests?

juliusvonkohout avatar May 06 '24 13:05 juliusvonkohout

@lampajr can you fix the bentoml tests?

I can try to take a look but looking at the previous runs of the test [1], it looks like it never worked :thinking:

Alternatively, we could create a different issue to fix bentoml test, as I think it is not directly related to this PR, wdyt?

[1] https://github.com/kubeflow/manifests/actions/workflows/bentoml_kind_test.yaml

lampajr avatar May 06 '24 16:05 lampajr

Hey @juliusvonkohout, I did some additional investigation and it looks like the contrib/bentoml directory is pretty outdatated, thus in the logs I see:

2024-05-02T20:04:08.5386818Z 1.714680241638799e+09	ERROR	controller-runtime.source	if kind is a CRD, it should be installed before calling Start	{"kind": "HorizontalPodAutoscaler.autoscaling", "error": "no matches for kind \"HorizontalPodAutoscaler\" in version \"autoscaling/v2beta2\""}
2024-05-02T20:04:08.5389859Z sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1
2024-05-02T20:04:08.5391173Z 	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/source/source.go:139
2024-05-02T20:04:08.5392765Z k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext
2024-05-02T20:04:08.5394133Z 	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:235
2024-05-02T20:04:08.5395093Z k8s.io/apimachinery/pkg/util/wait.WaitForWithContext
2024-05-02T20:04:08.5396257Z 	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:662
2024-05-02T20:04:08.5397237Z k8s.io/apimachinery/pkg/util/wait.poll
2024-05-02T20:04:08.5398196Z 	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:596
2024-05-02T20:04:08.5399440Z k8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext
2024-05-02T20:04:08.5400688Z 	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:547
2024-05-02T20:04:08.5401926Z sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1
2024-05-02T20:04:08.5403159Z 	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/source/source.go:132
2024-05-02T20:04:13.5355408Z + kubectl -n kubeflow wait --for=condition=available --timeout=600s deploy/fraud-detection
2024-05-02T20:04:13.5830234Z Error from server (NotFound): deployments.apps "fraud-detection" not found
2024-05-02T20:04:13.5852648Z make: *** [Makefile:22: test] Error 1

It looks like it is complaining about missing HorizontalPodAutoscaler CRD in autoscaling/v2beta2 api version which should be present in older versions of Kubernetes (?) - not sure if that is the real root cause, but it could be.

Then, just as a test, I changed the version to (not the latest but at least newer versions):

BENTOML_YATAI_IMAGE_BUILDER_VERSION ?= 1.1.25
BENTOML_YATAI_DEPLOYMENT_VERSION ?= 1.1.21

And I re-generated the contrib/bentoml resources:

modified:   bentoml-yatai-stack/bases/yatai-deployment/resources.yaml
modified:   bentoml-yatai-stack/bases/yatai-image-builder/resources.yaml

And with those new resources, the test seems working (at least the kubectl wait conditions are met).

In conclusion it seems like the issue is not in the test itself, but it looks like the bentoml manifests are not really up to date, is this expected/wanted?

Anyway I think this could be a topic that should be discussed separately from this PR, unless we already know the answer (do you?).

lampajr avatar May 06 '24 16:05 lampajr

Yes, please update the bentoml stuff, because we have to pass all tests for now. If it is a small change even in this PR here.

juliusvonkohout avatar May 07 '24 07:05 juliusvonkohout

Yes, please update the bentoml stuff, because we have to pass all tests for now. If it is a small change even in this PR here.

I created a separate PR as it looks like the is an error while running the test.sh script inside contrib/bentoml. See https://github.com/kubeflow/manifests/pull/2704

lampajr avatar May 07 '24 08:05 lampajr

Rebased after https://github.com/kubeflow/manifests/pull/2704 merge

lampajr avatar May 07 '24 15:05 lampajr

/lgtm /approve

juliusvonkohout avatar May 07 '24 16:05 juliusvonkohout

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: juliusvonkohout, lampajr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • ~~OWNERS~~ [juliusvonkohout]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow[bot] avatar May 07 '24 16:05 google-oss-prow[bot]