fix: Resolve errors in e2e tests for cypress in Katib UI
What this PR does / why we need it: This PR fixed the errors in e2e tests for cypress.
Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2381
Checklist:
- [ ] Docs included if any changes are user facing
This PR relies on a commit in my forked repository: https://github.com/kubeflow/kubeflow/compare/master...tariq-hasan:kubeflow:fix-katib-ui-tests.
The commit needs to be merged into the kubeflow repository and the COMMIT file in this PR needs to be updated with the latest commit tag after the merge.
Also the addition of the process package in components/crud-web-apps/common/frontend/kubeflow-common-lib/package.json and the update to the associated package-lock.json may have repercussions for crud-web-apps. This needs to be assessed further.
The e2e tests involving Cypress have passed: https://github.com/kubeflow/katib/actions/runs/9935880934?pr=2384.
There is a failed test for Katib UI: https://github.com/kubeflow/katib/actions/runs/9935880964/job/27442941694?pr=2384.
This is coming from the npm ci commands in https://github.com/kubeflow/katib/blob/master/cmd/ui/v1beta1/Dockerfile.
I am investigating the issue and will have it fixed.
Thank you so much for investigating this and for this fix @tariq-hasan!
Please can you submit PR into kubeflow/kubeflow to add this library so we can take the appropriate commit for Katib repo?
Thank you so much for investigating this and for this fix @tariq-hasan! Please can you submit PR into
kubeflow/kubeflowto add this library so we can take the appropriate commit for Katib repo?
Sounds good. I'll try to fix the failed Katib UI test so that I can identify if any other code changes are required for kubeflow/kubeflow. I am then creating a PR to merge the code changes in kubeflow/kubeflow and will then update this PR for review.
Failed Katib UI test: https://github.com/kubeflow/katib/actions/runs/9935880964/job/27442941694?pr=2384
/rerun-workflow "Publish Katib Core Images"
/rerun-all
@tariq-hasan @tenzen-y It looks like /rerun-all is working only right now.
We might need to wait until this one will be merged to rerun specific workflow: https://github.com/estroz/rerun-actions/pull/3
Thank you for the clarification.
I did some further investigation on the Publish Katib Core Images workflow.
I found that the workflow fails within the final stage of the Dockerfile due to the results coming from npm ci.
If I replace npm ci with npm install --no-package-lock the workflow succeeds.
That said, npm ci uses package-lock.json instead of package.json and is therefore a more stringent validation of the configuration in the frontend folder. In addition, npm ci is less resource-intensive compared to npm install. As such, I suppose that npm install --no-package-lock is not a feasible option.
As npm install --no-package-lock succeeds and npm ci fails the issue could be due to package-lock.json not working in the node:14 base image as it was generated in a different environment on my local.
I will try to do some more investigation to determine why the build is failing and find an appropriate fix.
Thank you for this deep investigation @tariq-hasan!
@kimwnasptd @elenzio9 @orfeas-k @kubeflow/wg-data-leads @ederign @alexcreasy
cc some UI folks on problems with Katib and Kubeflow UIs.
Please can you help us to resolve problems with npm and node ?
@tariq-hasan @andreyvelich I think the answer is already highlighted in that previous commnet
package-lock.json not working in the node:14 base image as it was generated in a different environment on my local.
If the build succeeds with npm install --no-package-lock, we 'd need to update package-lock.json file (and commit it) from the same node and npm version with the one being used in the CI.
What changes do we need to make for Kubeflow UI library ?
@andreyvelich I believe what @orfeas-k meant is that @tariq-hasan generated package-lock.json file in the PR needs to match the node (and npm) version of the node where the 'Publish Katib Core Images' is being executed.
@tariq-hasan, could you please double-check that?
Also, I'm new to the project, but is there a reason why we are upgrading to node 14 instead of node 16 like kubeflow/kubeflow is using?
Also, I'm new to the project, but is there a reason why we are upgrading to node 14 instead of node 16 like kubeflow/kubeflow is using?
I think, the common Kubeflow frontend component that we use doesn't use Node 16 today: https://github.com/kubeflow/kubeflow/blob/master/components/crud-web-apps/common/frontend/kubeflow-common-lib/package.json#L62
@tariq-hasan Did you check if cypress tests can be succeeded on Node 16 ? If yes, we should just update it.
I am upgrading kubeflow-common-library in crud-webapps as well as katib-ui to node 16.
That said, the following components are still using node 12:
I will also try to reproduce the build from a Dockerfile on my local to ensure that the package-lock.json is correctly generated.
/rerun-all
/rerun-all
Using package-lock.json from a build of the Dockerfile did not fix the errors.
However, upgrades of both crud-webapps/common/frontend/kubeflow-common-library and katib-ui to node 16 fixed the errors.
Please let me know if there are any feedback.
I will then create a PR to move kubeflow-common-library to node 16.
Yes, @tariq-hasan please create PR into kubeflow/kubeflow to update Node version for common library.
Got it. As part of the PR, I will upgrade the following components from version 12 to version 16 to ensure compatibility with kubeflow-common-library and for the CI tests in kubeflow/kubeflow to pass successfully.
Got it. As part of the PR, I will upgrade the following components from version 12 to version 16 to ensure compatibility with
kubeflow-common-libraryand for the CI tests inkubeflow/kubeflowto pass successfully.
Thank you for the great investigation! Once you open the PR in the kubeflow/kubeflow repository, please put the link on this PR.
@tariq-hasan Any success with submitting PR into kubeflow/kubeflow so we can fix the e2e tests in Katib UI ?
I apologize for the delay. I've updated the configuration for tensorboards. I am working through jupyter, volumes and centraldashboard-angular, and will create a PR in kubeflow/kubeflow shortly.
Thank you @tariq-hasan, appreciate your time for it!
/rerun-all
I have created a PR for kubeflow/kubeflow: https://github.com/kubeflow/kubeflow/pull/7637.
This PR upgrades the node version from 12 to 16 for kubeflow-common-lib and its dependent components: jupyter, tensorboards, volumes and centraldashboard-angular.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/remove-lifecycle stale
@andreyvelich @tenzen-y I was wondering if you have any thoughts on why the E2E Test with enas-cnn-cifar10 could be failing.
I have re-run it multiple times but to no success.
I have not seen this test fail before so this is quite odd.
/rerun-all
/lgtm /approve