sh icon indicating copy to clipboard operation
sh copied to clipboard

[wip] Searching commands, use PATH from baked-in `_env`

Open eode opened this issue 3 years ago • 11 comments

This modifies resolve_command and resolve_command_path to respect the PATH from a baked-in _env option.

Problem & Solution

It is difficult to use sh to access executables in a venv. If sh respects a baked-in _env argument when searching commands, it works acceptably. Note that this also extends to respecting the path of the baked-in _env in general.

Example Program

Installed into venv's bin folder.

#! python3
print('moop!')

Example Before PR

# enter a virtual environment
# naive, but just an example.
>>> venv = dict([line[:-1].split('=', 1) for line in sh.bash('-c', '. test/bin/activate && env')])
>>> vsh = sh.bake(_env=venv)
>>> vsh.moop()
[traceback...]
CommandNotFound: moop

Example after PR

# enter a virtual environment
# naive, but just an example.
>>> venv = dict([line[:-1].split('=', 1) for line in sh.bash('-c', '. test/bin/activate && env')])
>>> vsh = sh.bake(_env=venv)
>>> vsh.moop()
moop!

Now, it would be a nice feature if there were a way to set the venv by specifying an activate file, like: vsh = sh._activate('test/bin/activate')

..however, this is functional enough and all I have time for. I thought you might be interested in it.

eode avatar May 19 '22 18:05 eode

I'm not sure I like this. We're essentially lifting environment variables that should be applied in the subprocess into the parent process. If you want to resolve commands differently in the parent process, then you can set PATH explicitly:

>>> import sh
>>> import os
>>> sh.python("-c", "import sys; print(sys.executable)")
/Users/erik/venv1/bin/python

>>> os.environ["PATH"] = f'/Users/erik/venv2/bin:{os.environ["PATH"]}'
>>> sh.python("-c", "import sys; print(sys.executable)")
/Users/erik/venv2/bin/python

If you're just interested in pointing to a different python executable, you can do that explicitly:

python = sh.Command("/Users/erik/venv2/bin/python")

ecederstrand avatar May 20 '22 07:05 ecederstrand

I can definitely understand that you want to keep a good separation between the sub- and parent process. Perhaps a different method than sh(_env=venv) would be better for working with venv. However, I very much don't want to be changing the global Python environment, swapping it back and forth whenever I want to use an sh instance in a different environment. The gold of it is that it provides a very nice way to manage multiple differing environments.

The thing that's nice about it is I can do something like this:

venvs = []
for activator in base_path.glob('**/bin/activate'):
    envfile = (activator.parent.parent / '.env').read_text()  # previously created
    env = dict(line.split('=', 1) for line in envfile.split('\n') if line.strip())
    vsh = sh(_env=env)
    venvs.append(vsh)
  
# just an example
for vsh in venvs:
    try:
        vsh.python('-c', 'pipdeptree')
    except ErrorReturnCode_1
        vsh.pip.install(pipdeptree

Basically, sh can be very useful when taking actions across multiple virtualenvs, and I haven't interfered with my global interpreter state (which would affect all sh instances).

Now -- perhaps overloading the (_env) parameter isn't the way -- but some way of setting the environ for a specific SH instance would be nice.

eode avatar May 20 '22 09:05 eode

It's not clear to me what you mean by "sh instance". When you import the sh package in your script, it will be the package installed in the package collection of the parent Python interpreter. You're never hitting the sh pacakge of a virtualenv.

If you want to install packages in a range of virtualenvs, you can just do like this:

for venv_python in base_path.glob('**/bin/python'):
    python = sh.Command(venv_python)
    python('-m', 'pip', 'install', 'pipdeptree')

ecederstrand avatar May 20 '22 09:05 ecederstrand

It's not clear to me what you mean by "sh instance". When you import the sh package in your script, it will be the package installed in the package collection of the parent Python interpreter. You're never hitting the sh pacakge of a virtualenv.

I'm not at all talking about executing the sh.py code that is installed into a part of the virtualenv's site-packages (if it's even installed inside a venv I'm working with). I'm not even ultimately talking about venvs, they're just a good example.

I'm talking about having an environment associated with any particular <module 'sh'>.SelfWrapper.

With this change:


import sh
from my_lib import my_env

esh = sh(_env=my_env)

then:

  • sh uses the global environment, os.environ
  • esh uses a different environment, my_lib.my_env.

I think this has the following benefits:

  • it's powerful
  • it's conceptually simple
  • it saves the fingers
  • it makes it easier to work with multiple environments.

I don't think you could convince me otherwise, because I'm already using it. :-)

Now, I could be convinced that what I did in this PR was simple as opposed to complete, and that you (or I) could find a better way to implement it.

eode avatar May 20 '22 19:05 eode

From the perspective of aligning with behavior in the shell, I agree that it's reasonable to take PATH into account when resolving the path to the executable:

% PATH=/Users/erik/venv1/bin which python
/Users/erik/venv1/bin/python
% PATH=/Users/erik/venv2/bin which python
/Users/erik/venv2/bin/python

We can't easily make that work in the general case in sh because the executable path is resolved before the method is executed. I.e. the equivalent of the above does not work because PATH is interpreted too late:

>>> import sh
>>> sh.python3("-c" "import sys; print(sys.executable)")
/usr/bin/python3
>>> sh.python("-c" "import sys; print(sys.executable)", _env={"PATH": "/Users/erik/venv1/bin"})
/usr/bin/python3

@amoffat What do you think? Does it make sense to change the way path resolving is done within SelfWrapper?

ecederstrand avatar May 24 '22 06:05 ecederstrand

Thanks for both of your thoughts on the problem. Let me repeat back the problem to make sure I understand it:

Essentially @eode would like the ability to call sh.something and have something resolve to a location that is outside of the current process's PATH, without altering the current process's PATH. The immediate application of this is calling executables that live in a virtualenv.

I agree with you both that it is reasonable to have sh optionally look at some kind of path variable when resolving the location of an executable. I agree with @ecederstrand that it isn't possible to do that, since the creation of the Command object happens before the arguments are considered. I remembered that there is some form of this in the Command class, via the search_paths argument, and it only works precisely because the executable lookup happens once we already have the search_paths arg.

I propose we could use something similar: there should exist a _paths (or _search_paths for consistency?) special keyword argument that is used for looking up executables, and it functions the same way the Command class's search_paths argument functions:

sh.python(_paths=["my/.venv/bin"])

Now I know what you're thinking: it still doesn't work because sh.python is resolved before _paths is considered. Yes, but if that special kwarg existed, we could leverage the default arguments feature:

my_venv = sh(_paths["my/.venv/bin"])
my_venv.python()

That would work because the _paths argument can be considered before the .python() lookup happens. I think this would be a better fit than reusing _env, for the reasons that you all have discussed earlier in the thread (keeping subprocess env abstraction from leaking into the parent process).

The obvious downside is that sh.python(_paths=["my/.venv/bin"]) doesn't really work at all, so using it could be confusing. However, in the distant future, if it trips up enough people, we could technically implement it using some magic lookup functionality (by returning something like a LazyCommand instead of a Command, which lazily resolves to the underlying executable only when it is called)

What do you all think?

amoffat avatar May 26 '22 02:05 amoffat

I quite like the LazyCommand approach because it could work regardless of whether we call sh.python() directly or via a wrapped sh. It also better matches what happens in the shell; you won't get a Command not found until you actually try to run the executable. It would, however, change the behavior of sh to not give instant feedback on import:

>>> from sh import xxx
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'xxx' from 'sh'
>>> # With a lazy approach:
>>> from sh import xxx
>>> xxx()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'xxx' from 'sh'

I'm not sure a special _path attribute is needed. When we resolve the full path of a command in resolve_command_path() we end up using the PATH env var anyway. Instead of always using PATH from os.environ, we could check baked args first. Altering PATH to change command resolve behavior is so common knowledge that I think it's better to not make sh special in that regard.

ecederstrand avatar May 26 '22 09:05 ecederstrand

By the way you can already achieve this like so:

vsh = sh.env.bake(_env=venv)
vsh.moop()

Because the shell command env also executes a command.

(Printing the environment is just the special case it does when it has no commands. The original main job of env was executing other programs with a modified environment, like this: env FOO=bar qux ... executes qux ... with FOO=bar added to the environment. Of course you can specify any number of environment changes, which includes zero environment changes, so you can just do env qux ... as an indirect way to invoke qux ..., which is normally pointless, but is great in this situation because env has to do a PATH lookup to find qux!)

So, you can invoke env with the PATH set in its environment, and env will use that modified PATH when executing into the rest of its argument list (and it's not even fork+exec - just exec, that same process becomes the other command, so you don't even have any of the bad edge cases that you can get into with child-of-child processes).

And env is a standard program on any UNIX-like system, so you can reasonably rely on it being available.

mentalisttraceur avatar May 27 '22 18:05 mentalisttraceur

@mentalisttraceur TIL! Thanks for explaining.

@eode does this solve your problem?

amoffat avatar May 27 '22 18:05 amoffat

One downside to going through a baked exec chained env is that the __fspath__ method (that we've been discussing in #579) for vsh.moop will report the location of env rather than of moop.

So supporting sh.bake(_path=...) is something you might still want to do anyway?

mentalisttraceur avatar May 27 '22 19:05 mentalisttraceur

I opened a PR to showcase how lazy resolving could work: https://github.com/amoffat/sh/pull/616

With this PR you can do something like:

>>> from sh import python3
>>> python3("-c" "import sys; print(sys.executable)", _env={"PATH": "/Users/erik/venv1/bin"})
/Users/erik/venv1/bin/python3
>>> python3("-c" "import sys; print(sys.executable)", _env={"PATH": "/Users/erik/venv2/bin"})
/Users/erik/venv2/bin/python3

It comes with the caveat mentioned in https://github.com/amoffat/sh/pull/602#issuecomment-1138351950 I'm not sure I like it, but let's discuss here.

ecederstrand avatar Aug 08 '22 06:08 ecederstrand

It seems like @mentalisttraceur 's suggestion solves the original issue. If i'm mistaken please comment and we can reopen

amoffatgmi avatar Feb 09 '23 16:02 amoffatgmi

Apologies for not tracking this issue. Yes, this resolves the original issue.

On Thu, Feb 9, 2023, 11:49 AM amoffatgmi @.***> wrote:

Closed #602 https://github.com/amoffat/sh/pull/602.

— Reply to this email directly, view it on GitHub https://github.com/amoffat/sh/pull/602#event-8481651800, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS4PT75KTBBTISNBVJEYYTWWUN23ANCNFSM5WNDU3PA . You are receiving this because you were mentioned.Message ID: @.***>

eode avatar Feb 10 '23 00:02 eode