nebari [BUG] - PR pytests are failing, possibly because repo env vars are not available

Describe the bug

pytests are failing for at least several PRs. Logs indicate that an exception is being raised by many tests due to various env vars not being set:

==================================== ERRORS ====================================
_ ERROR at setup of test_set_config_from_environment_variables[nebari_config_options1] _
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/nebari-dev/lib/python3.10/site-packages/_pytest/runner.py", line 342, in from_call
  
  <snip>
  
    File "/usr/share/miniconda/envs/nebari-dev/lib/python3.10/os.py", line 680, in __getitem__
    raise KeyError(key) from None
KeyError: 'AWS_ACCESS_KEY_ID'

See full logs here:
https://github.com/nebari-dev/nebari/actions/runs/11386526924/job/31851899812?pr=2752

Expected behavior

PR Pytests should pass.

OS and architecture in which you are running Nebari

This is an issue with the existing github Nebari test pipeline.

How to Reproduce the problem?

Example failing PRs:
https://github.com/nebari-dev/nebari/pull/2752
https://github.com/nebari-dev/nebari/pull/2730

Command output

No response

Versions and dependencies used.

No response

Compute environment

None

Integrations

No response

Anything else?

No response

Oct 21 '24 21:10 tylergraff

Thanks for raising the issue, @tylergraff . Indeed, this has come to our attention since last week. We are not entirely sure why this started happening now. My main guess is that this was related to our change in branches and some internal global environment variables from the nebari-dev org.

This is quite troublesome since it blocks development in any given PR upstream that alters any files under src. In the ideal world, these falling tests (likely affecting all providers, at least AWS and GCP so far) should not need credentials at all and only do so due to the test init part of the unit tests under it.

The best-case scenario would be to decouple such checks completely from contests since we already have a separate workflow. However, a more likely fix will be to mock the cloud API requests that require credentials using pytest.mock.

Another quick fix for any affected PR right now would be to copy the vault block that we have under our cloud provider tests workflow and have a similar step in these falling tests to populate the env vars with the missing credentials.

Oct 21 '24 21:10 viniciusdc

@viniciusdc the vault block fix - it seems this should be a separate PR which gets merged to fix the tests. Do you agree?

Oct 21 '24 21:10 tylergraff

@tylergraff @viniciusdc I created a new Fork on nebari-dev main as opposed to using the MetroStar/nebari, which is forked on the previous default develop branch. Unfortunately, my new test PR#2788, forked on main branch, still fails the same tests

I'm curious to see if this commit from last month is causing an error to be reported

Oct 22 '24 18:10 joneszc

Re-opening as the issue still persists, though the work done in the PR above, helped clear some related matters.

Oct 23 '24 15:10 viniciusdc

Hello @viniciusdc @marcelovilla

I'm wondering, is there a reason this commit did not include .github/workflows/test.yaml in the Rename default branch to main and get rid of develop as a target update ?

The develop branch appears to be included still as a target for Pytest tests.

Oct 23 '24 18:10 joneszc

Hey @joneszc. No particular reason other than I probably missed that one. Happy to open another PR removing it.

Oct 23 '24 19:10 marcelovilla

Hey @joneszc. No particular reason other than I probably missed that one. Happy to open another PR removing it.

@marcelovilla it could be coincidence, but we are seeing PRs fail specifically those Pytest tests from that workflow. We are troubleshooting whether the Pytest failures are related to renaming develop branch to main

Will you have time to add the PR today for removing develop from test.yaml branches ? Thanks

Oct 23 '24 19:10 joneszc

@joneszc I've opened https://github.com/nebari-dev/nebari/pull/2792. Would appreciate your review there.

Oct 23 '24 21:10 marcelovilla

@joneszc now that the PR was merged, are you still seeing the Pytest failures?

Oct 24 '24 07:10 marcelovilla

Looking into this, I don't think the reason for the pytest failures is anything to do with branches or env vars. Here is what I found.

I wanted to make sure this wasn't due to the PR coming from a forked repo since secrets are not shared in PR's from a forked repo. To test this made a branch in the nebari repo and merged @joneszc 's PR (https://github.com/nebari-dev/nebari/pull/2788) into my branch. Result was CI still failed. This eliminated the forked repo as the cause.

I wanted to eliminate the possibility that there was something to do with the secrets not being available for some reason. I could not see a reason they wouldn't be, but more importantly, I could not see a way that the secrets were actually being used in the unit test workflow. Regardless, I explicitly made them available in https://github.com/nebari-dev/nebari/pull/2794/commits/e37bc5c3bc77e917594d549c0e4d2fecce61ff8a. The result was the error changed, but the workflow still failed. Now instead of a key error it said the provided token was invalid. This told me 2 important things. First, the action had access to secrets. Second the action was not using the secrets when it passed. This led me to believe that this actually represented actual failing unit tests, and not a problem with the pipeline.

I next wanted to investigate how the other tests were passing, and looking at tests/tests_unit/conftest.py I saw that there were pytest fixtures for other assets such as instances, kubernetes versions, etc. This PR introduces a new AWS resource check, a KMS key check, but does not add the pytest mock for unit tests. This is why tests were previously passing, but now are failing. The relevant fixture is at https://github.com/nebari-dev/nebari/blob/215bbd622f0ea1692e79bb4411db81ef28b1d329/tests/tests_unit/conftest.py#L40

In summary, I believe that if @joneszc adds the relevant mock to the pytest fixture his code will pass and this issue can be closed. I did a quick POC at https://github.com/nebari-dev/nebari/pull/2794 and you can see tests are passing. Obviously @joneszc will want to be a real implementation with good test coverage, but this at least shows that this will resolve the issue. This looks like the issue for the other failing PR's as well

Oct 24 '24 10:10 dcmcand

I next wanted to investigate how the other tests were passing, and looking at tests/tests_unit/conftest.py I saw that there were pytest fixtures for other assets such as instances, kubernetes versions, etc. This PR introduces a new AWS resource check, a KMS key check, but does not add the pytest mock for unit tests. This is why tests were previously passing but are now failing. The relevant fixture is at

Nice, that's great! It explains why we only saw the fails when a change was made to the provider checks. So, this needs to be documented in the unit-tests file or the provider checks so that we are more aware of this in the future. Thanks a lot for the summary and for having a look at this @dcmcand :rocket:

Oct 24 '24 13:10 viniciusdc

The suggested changes resolved this issue. Closing

Oct 28 '24 12:10 dcmcand