katib icon indicating copy to clipboard operation
katib copied to clipboard

Introduced error constants and replaced reflect with cmp

Open tariq-hasan opened this issue 11 months ago • 23 comments

What this PR does / why we need it:

Following the suggestion https://github.com/kubeflow/katib/pull/2281#pullrequestreview-1926741952 from @tenzen-y this code change provides a more thorough comparison of errors through the introduction of error constants and the replacement of replace.DeepEqual with cmp.Diff.

In addition some of the test cases in generator_test.go have been revised to allow for a more thorough comparison of error messages.

Finally cosmetic changes have been applied to some of the tests in order to match the style of the code in config_test.go that compares error constants.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes https://github.com/kubeflow/katib/issues/2268

Checklist:

  • [ ] Docs included if any changes are user facing

tariq-hasan avatar Mar 17 '24 23:03 tariq-hasan

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tariq-hasan Once this PR has been reviewed and has the lgtm label, please assign gaocegege for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow[bot] avatar Mar 17 '24 23:03 google-oss-prow[bot]

@tariq-hasan You should follow this guide to pass DCO check.

https://github.com/kubeflow/katib/pull/2289/checks?check_run_id=23472655010

tenzen-y avatar Apr 06 '24 18:04 tenzen-y

@tariq-hasan LGTM, but the latest commit seems to violate DCO.

tenzen-y avatar Apr 16 '24 05:04 tenzen-y

@tariq-hasan LGTM, but the latest commit seems to violate DCO.

I addressed the issue. I am wondering if I should squash the commits given that the changes are final now.

tariq-hasan avatar Apr 16 '24 05:04 tariq-hasan

@tariq-hasan LGTM, but the latest commit seems to violate DCO.

I addressed the issue. I am wondering if I should squash the commits given that the changes are final now.

The prow automatically squash commits into one. So, I'm ok with either way.

tenzen-y avatar Apr 16 '24 05:04 tenzen-y

Makes sense.

After reviewing the PR I find that these are the final set of changes

  • introduce error constants
  • replace reflect with cmp for error comparison
  • define test cases as maps instead of slices
  • rewrote test cases for TestGetRunSpecWithHP

I was thinking if each of these should ideally be its own commit so that we maintain a clear separation of concerns.

tariq-hasan avatar Apr 16 '24 05:04 tariq-hasan

Makes sense.

After reviewing the PR I find that these are the final set of changes

  • introduce error constants
  • replace reflect with cmp for error comparison
  • define test cases as maps instead of slices
  • rewrote test cases for TestGetRunSpecWithHP

I was thinking if each of these should ideally be its own commit so that we maintain a clear separation of concerns.

I think that those sets depend on each. So, those should be included in a single PR.

Note that: We have no way to select the merge method merge. The prow enforces the squash method.

tenzen-y avatar Apr 16 '24 05:04 tenzen-y

Makes sense.

tariq-hasan avatar Apr 16 '24 05:04 tariq-hasan

@tariq-hasan Could you address this error: https://github.com/kubeflow/katib/actions/runs/8700291109/job/23861062475?pr=2289?

tenzen-y avatar Apr 16 '24 05:04 tenzen-y

@tariq-hasan Could you address this error: https://github.com/kubeflow/katib/actions/runs/8700291109/job/23861062475?pr=2289?

I guess that we could avoid those errors after you rebase this PR.

tenzen-y avatar Apr 16 '24 05:04 tenzen-y

Okay. I thought that these might have been flakiness issues as the CI workflow seems to pass and fail quite unpredictably for this test case.

I am rebasing the PR.

tariq-hasan avatar Apr 16 '24 05:04 tariq-hasan

Okay. I thought that these might have been flakiness issues as the CI workflow seems to pass and fail quite unpredictably for this test case.

I am rebasing the PR.

No, actually these errors were brought to us by the kubernetes issue w/o supporting SSA for the fake client. And, I addressed that issue recently in the previous katib PR. So, rebasing should fix those errors.

tenzen-y avatar Apr 16 '24 06:04 tenzen-y

Got it.

tariq-hasan avatar Apr 16 '24 06:04 tariq-hasan

@tariq-hasan Uhm, it seems that meaningful errors occur. Could you investigate them?

tenzen-y avatar Apr 16 '24 06:04 tenzen-y

I am working through them now.

tariq-hasan avatar Apr 16 '24 06:04 tariq-hasan

I am working through them now.

Thank you!

tenzen-y avatar Apr 16 '24 06:04 tenzen-y

I did run into the issues here before.

image

image

The errors are not always reproducible however. I usually get these errors in the first one to three runs of make test on my local and then the errors disappear for subsequent runs.

If I remove the kube-apiserver and etcd processes from my local and run make test then I find that the errors sometimes reappear.

That said I now restarted my local and I am running make test ENVTEST_K8S_VERSION=1.27 (or 1.28 or 1.29) but I do not see the issues appearing.

In the few cases where I ran into the errors I did root-cause that the errors were due to the order of execution of this code with the order of execution for this code.

Not sure if you have any thoughts of what might be happening here.

tariq-hasan avatar Apr 16 '24 06:04 tariq-hasan

Please ignore my comments above.

The error was due to the order of test case execution in cases in TestGetRunSpecWithHPConfigMap.

I fixed the issue in this commit and rebased the PR.

I ran the Go tests in a duplicate branch and it looks good.

tariq-hasan avatar Apr 26 '24 04:04 tariq-hasan

@tariq-hasan I merged #2316. So could you rebase this PR?

tenzen-y avatar Apr 30 '24 14:04 tenzen-y