grass icon indicating copy to clipboard operation
grass copied to clipboard

grass.script: Allow init to modify only specified environment

Open wenzeslaus opened this issue 1 year ago • 9 comments

The grass.script.setup.init function modifies os.environ. While os.environ is fit for many, if not most, cases, usage in tests and parallel processing is limited and in all cases, os.environ stays partially modified even after session is finished (we don't destroy the runtime environment, i.e., variables such as GISBASE). With this change, init takes env parameter specifying the environment to modify and it modifies that environment. When no env is provided, it still modifies os.environ, so the default behavior is not changed.

This required only few changes to the initialization code, but more changes were needed for the cleanup code. A lot of tests can now take advantage of this functionality and this PR updates some of them. I plan to update all where it is applicable, but will leave as is some others, namely those which use ctypes (situation is more complex there) and grass.jupyter (env parameter is not implemented for most of them).

I'm not decided whether this should go to 8.4 or wait for 8.5. There is plenty of tests covering it, but the cleanup part which needed most changes does not have any coverage. However, if someone gives it a careful read, it might be possible to have this for 8.4. I don't think there is much concern in terms of API because env parameters is what we now have in many functions. The only uncertainty with that is whether it should create its own copy or modify the provided environment.

This includes changes to g.message interface (python/grass/script/core.py) which I intent to submit in a separate PR (#3439). Also, python/grass/script/tests/conftest.py needs a doc fix (done).

Special thanks goes to @echoix for the advice about pytest's monkeypatch which made the test code possible (and simpler in other places).

wenzeslaus avatar Feb 19 '24 22:02 wenzeslaus

I also have concerns about passing around an object that has all env vars populated. I know in CI, we need to be careful to avoid leaking them to software that doesn't require them, and to not show them in logs, as we wouldn't want to see the tokens and secrets. I don't know what are the practices in a Python software, but I would like to make sure that we don't accidentally dump them in a test.

Is the CI-security class that sent a PR yesterday would be interested in that question, or have an answer for this?

echoix avatar Feb 19 '24 23:02 echoix

I also have concerns about passing around an object that has all env vars populated.

One thing is that the object is readily available already in Python:

import os
print(os.environ)

Another thing is that we are passing these objects around already to achieve various tasks such as data management and parallelization:

env = create_environment(...)
run_command(...env=env)

I have a WIP PR #2923 which aims at hiding some of that, but the env would be simply hidden in another object changing tyhe above to:

with gs.setup.init(...) as session:
    tools = Tools(session=session)
    tools.r_surf_gauss(output="surface")

I know in CI, we need to be careful to avoid leaking them to software that doesn't require them, and to not show them in logs, as we wouldn't want to see the tokens and secrets. I don't know what are the practices in a Python software, but I would like to make sure that we don't accidentally dump them in a test.

I don't know anything else to prevent them besides don't print them. The original variables for the process are always available on Linux with proc/{pid}/environ.

Is the CI-security class that sent a PR yesterday would be interested in that question, or have an answer for this?

Diving into this might too much of a rabbit trail, but I'll discuss it with them.

wenzeslaus avatar Feb 20 '24 02:02 wenzeslaus

I'm not decided whether this should go to 8.4 or wait for 8.5. There is plenty of tests covering it, but the cleanup part which needed most changes does not have any coverage. However, if someone gives it a careful read, it might be possible to have this for 8.4. I don't think there is much concern in terms of API because env parameters is what we now have in many functions. The only uncertainty with that is whether it should create its own copy or modify the provided environment.

From what I'm aware of, we don't have a feature-freeze policy here, and we didn't decide to start the RC process yet. If you're committed to including it and think you'd still have the bandwidth to tune up in time for release, I don't see why it couldn't be there.

Otherwise, if these changes are considered too breaking for this release (I personally think that full-code wide changes of location to project is way more impactful than this), then part of the usage of mocking env vars in tests could be integrated in some way. The only test fixture was to set no env vars though, not setting them/replacing them (that would make pytest tear them down afterwards)

echoix avatar Feb 20 '24 04:02 echoix

This touches a lot of files, but that's just the update to use the better API in tests. The changes are in python/grass/script/setup.py. The direct tests for this are in python/grass/script/tests/grass_script_setup_test.py.

wenzeslaus avatar May 03 '24 13:05 wenzeslaus

The CI fails with GISBASE being required in certain context. The test code accounts for that already (or at least it should). Locally, I'm not able to reproduce the issue (just as expected for how the test code is written).

wenzeslaus avatar May 03 '24 15:05 wenzeslaus

Merging either of #3683 and #3688 should indirectly solve the the failing test.

wenzeslaus avatar May 06 '24 00:05 wenzeslaus

I think #3683 is the less controversial of the two, let's start with it.

echoix avatar May 06 '24 00:05 echoix

I also have concerns about passing around an object that has all env vars populated.

This is using a mechanism what the Python subprocess package is using. We use subprocesses and we need to customize the environment to pass different GRASS-related settings. That's how the whole grass.script works. While Bandit warns about usage of subprocess package (dangerous in certain context) and specific usages of the subprocess package, I'm not aware of a Bandit warning related to usage of the env parameter.

I know in CI, we need to be careful to avoid leaking them to software that doesn't require them,...

All the subprocesses in general get all the variables from parent process. Having env actually allows users to sanitize the environment and let the GRASS subprocesses know only about what they want them to know.

I would like to make sure that we don't accidentally dump them in a test.

Printing env parameter or printing all parameters of run_command functions (which would include env) would be probably a bad idea.

wenzeslaus avatar May 06 '24 13:05 wenzeslaus

With #3689 merged, this PR should be rebased to remove the duplicated changes (that has conflicts)

echoix avatar May 09 '24 16:05 echoix