[MAINT] - Adress flakiness of Integration tests
Context
With the recent adoption of await workflow, which is a blessing since before we needed to include the kubectl command ourself anyways manually, we are getting some weird issues a few times with the image puller; it seems like it got stuck waiting for it in a couple of deployments, looks like a flaky behavior and requires further validation. There may be a need to increase the time limit or retries.
source: https://github.com/nebari-dev/nebari/actions/runs/12994981631/job/36240642535?pr=2924
Also, during releases, we have a hard time running CI against version bumps since, by common standard during the release workflow, we don't yet have the new images available, and the deployment fails under the check health status of the pods (namely jupyterhub)
source: https://github.com/nebari-dev/nebari/actions/runs/12952884533/job/36211476433?pr=2924
Value and/or benefit
Running/stable testing
Anything else?
No response
@viniciusdc I think the first case is related to https://github.com/nebari-dev/nebari/issues/2947. However, I agree our tests seem to be flaky and that needs to be addressed.
I recently noticed that there is another action that you can run with the jupyterhub/action-k8s-await-workloads@v3 and it allows you to inspect the affected pods (though usually we not need it since it generates too much data) for this specific error it allowed me finding a problem with promtail as seen bellow:
Which is a know issue for running Kind: https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files
I think I addressed this in the past, but maybe with the new update to ubuntu 24.x #2958 this might've been removed.
Since this is a bit different and mostly associated with the above update, I will open a new issue:
- [ ] Address fsnotify "too many open files" error on test-local-integration
This is one workflow where we can see the above error message https://github.com/nebari-dev/nebari/actions/runs/13415146171/job/37478696321?pr=2965, and here is a second run with the update of inotify https://github.com/nebari-dev/nebari/actions/runs/13417860818/job/37483043593?pr=2965