pex Detect we are running from inside a pex package

We need to execute a specific action if we are running in a virtual environment or from inside a pex package.

Currently what we do is to detect if the file ends with .pex extension:

def get_current_pex_filepath():
    pex_paths = [path for path in sys.path if path.endswith('.pex')]
    if pex_paths:
        return pex_paths[0]
    return None

This works for pex files without entry point but when specifying the bdist_all argument with an entry point the file is generated without the .pex extension Also when manually generating the pex file with the command line -o option there one can put every possible file extension.

Is there a clean way to detect that we are running from inside a pex file ?

May 02 '19 13:05 fhoering

Related to #626 - you should be able to import _pex when you're running from a PEX package, and not otherwise. It would be nice to have a canonical way to do this though.

May 23 '19 03:05 AlexHill

OK. Thanks. This is fine for me.

Do you want to keep this ticket open for a cleaner solution ? I would say setting env variable in the startup script should be fine, seems easy to implement, no additional python modules to load, not sure if existing env variables like PEX_ROOT are already available (when I tried it seems like thay aren't)

I can submit a PR if you advise a solution. Otherwise we can also close.

May 23 '19 11:05 fhoering

@jsirois

May 23 '19 17:05 fhoering

A read-only env variable, say PEX_VERSION, seems reasonable to me - but it is not robust. If running from a pex you then subproccess execute some other python code, that process will inherit PEX_VERSION by default and the python code you're executing, if pex aware - will be tricked into thinking it is running from inside a pex by the PEX_VERSION env var being present. It seems more robust to me to use a pex API. The _pex hack works today, but adding pex to your install_requires and then querying a known to be present API like pex.runtime.running_from_pex seems ideal.

Your code truly does depend on pex if it knows about it, so depending on pex directly seems kosher.
The tricky logic of knowing if your running from a pex regardless of PEX_FORCE_LOCAL stays inside pex where it belongs.
You avoid the _pex internal detail of bootstrapping.

What do you think about all this?

May 24 '19 16:05 jsirois

Yes, we depend on pex anyway. So we could also include it and use an api.

The thing I still don't understand is how to pass the state that we need to capture somehow in bootstrapping in _pex and then return in pex.runtime.running_from_pex. It is easy to do with an env variable because there are no dependencies. If done in the code I suppose it can be done by setting a variable this in pex.runtime from _pex.bootstrapper, sth like:

import pex
pex.runtime.from_pex = True

This seems ugly as _pex is just the pex modules moved to _pex. So we would have _pex and pex containing the same modules. Not sure how to do this in a clean way. Maybe the best would be renaming _pex to to pex and then add pex.runtime.running_from_pex function ?

May 29 '19 07:05 fhoering

@jsirois OK. I had a look at the code and it seems like there are currently already _pex and pex modules. Also there is already a warning about _pex being removed.

I tried setting this in pex_builder

from pex import runtime
runtime.running_from_pex = True

But is somehow erased when the pex entry point is executed. Maybe I missed something here. Not sure exactly about the workflow until the entrypoint is executed.

Anyway. I suppose you know best how to fix this. I will go for the _pex hack for now. If you can fix that for v2.0.0 it would be nice.

Jun 10 '19 16:06 fhoering

@fhoering I think this is now possible is two ways:

PEXes now export PEX pointing to the originating PEX file, packed PEX directory or loose PEX directory that launched the PEX process. See #1495.
PEXes now expose a public __pex__ package that can be used as a magic import hook. You add the PEX file to sys.path (perhaps via PYTHONPATH) and then you can import from the PEX by either prefixing a full import with __pex__; e.g.: from __pex__ import requests, or by just importing __pex__ alone once before importing any dependencies contained in the PEX file on the sys.path. See #1845.

Even though neither 1 nor 2 was intended to support this case, I think they both serve just fine. If you prefer to read an env var, use PEX and keep the caveat in mind that for multiple nested subprocess spawns, this could prove confusing and error prone. If you prefer access to a Pex API as the indicator you've booted from a PEX, just wrap import __pex__ in a try/except.

Does this state of affairs work for you @fhoering?

Aug 15 '24 23:08 jsirois

@fhoering I'm going to assume this works for you or would have and close this issue. Both the env var approach and the ~API approach work. A small Pex library release separate from pex (say pex-runtime-utils) could be released to do the try / import __pex__ / except for you, but I'll refrain from jumping on that until you or others speak out about a need.

Aug 17 '24 02:08 jsirois

@jsirois It is fine. thanks. We actually use the PEX env variable now. https://github.com/criteo/cluster-pack/blame/3f809e4b18ae491e7d8991b948d006d8d05bd9f0/cluster_pack/packaging.py#L527

Nov 12 '24 17:11 fhoering