Detect we are running from inside a pex package
We need to execute a specific action if we are running in a virtual environment or from inside a pex package.
Currently what we do is to detect if the file ends with .pex extension:
def get_current_pex_filepath():
pex_paths = [path for path in sys.path if path.endswith('.pex')]
if pex_paths:
return pex_paths[0]
return None
This works for pex files without entry point but when specifying the bdist_all argument with an entry point the file is generated without the .pex extension Also when manually generating the pex file with the command line -o option there one can put every possible file extension.
Is there a clean way to detect that we are running from inside a pex file ?
Related to #626 - you should be able to import _pex when you're running from a PEX package, and not otherwise. It would be nice to have a canonical way to do this though.
OK. Thanks. This is fine for me.
Do you want to keep this ticket open for a cleaner solution ? I would say setting env variable in the startup script should be fine, seems easy to implement, no additional python modules to load, not sure if existing env variables like PEX_ROOT are already available (when I tried it seems like thay aren't)
I can submit a PR if you advise a solution. Otherwise we can also close.
@jsirois
A read-only env variable, say PEX_VERSION, seems reasonable to me - but it is not robust. If running from a pex you then subproccess execute some other python code, that process will inherit PEX_VERSION by default and the python code you're executing, if pex aware - will be tricked into thinking it is running from inside a pex by the PEX_VERSION env var being present. It seems more robust to me to use a pex API. The _pex hack works today, but adding pex to your install_requires and then querying a known to be present API like pex.runtime.running_from_pex seems ideal.
- Your code truly does depend on pex if it knows about it, so depending on pex directly seems kosher.
- The tricky logic of knowing if your running from a pex regardless of
PEX_FORCE_LOCALstays inside pex where it belongs. - You avoid the
_pexinternal detail of bootstrapping.
What do you think about all this?
Yes, we depend on pex anyway. So we could also include it and use an api.
The thing I still don't understand is how to pass the state that we need to capture somehow in bootstrapping in _pex and then return in pex.runtime.running_from_pex.
It is easy to do with an env variable because there are no dependencies. If done in the code I suppose it can be done by setting a variable this in pex.runtime from _pex.bootstrapper, sth like:
import pex
pex.runtime.from_pex = True
This seems ugly as _pex is just the pex modules moved to _pex. So we would have _pex and pex containing the same modules. Not sure how to do this in a clean way. Maybe the best would be renaming _pex to to pex and then add pex.runtime.running_from_pex function ?
@jsirois
OK. I had a look at the code and it seems like there are currently already _pex and pex modules.
Also there is already a warning about _pex being removed.
I tried setting this in pex_builder
from pex import runtime
runtime.running_from_pex = True
But is somehow erased when the pex entry point is executed. Maybe I missed something here. Not sure exactly about the workflow until the entrypoint is executed.
Anyway. I suppose you know best how to fix this. I will go for the _pex hack for now. If you can fix that for v2.0.0 it would be nice.
@fhoering I think this is now possible is two ways:
- PEXes now export
PEXpointing to the originating PEX file, packed PEX directory or loose PEX directory that launched the PEX process. See #1495. - PEXes now expose a public
__pex__package that can be used as a magic import hook. You add the PEX file tosys.path(perhaps viaPYTHONPATH) and then you can import from the PEX by either prefixing a full import with__pex__; e.g.:from __pex__ import requests, or by just importing__pex__alone once before importing any dependencies contained in the PEX file on thesys.path. See #1845.
Even though neither 1 nor 2 was intended to support this case, I think they both serve just fine. If you prefer to read an env var, use PEX and keep the caveat in mind that for multiple nested subprocess spawns, this could prove confusing and error prone. If you prefer access to a Pex API as the indicator you've booted from a PEX, just wrap import __pex__ in a try/except.
Does this state of affairs work for you @fhoering?
@fhoering I'm going to assume this works for you or would have and close this issue. Both the env var approach and the ~API approach work. A small Pex library release separate from pex (say pex-runtime-utils) could be released to do the try / import __pex__ / except for you, but I'll refrain from jumping on that until you or others speak out about a need.
@jsirois It is fine. thanks. We actually use the PEX env variable now. https://github.com/criteo/cluster-pack/blame/3f809e4b18ae491e7d8991b948d006d8d05bd9f0/cluster_pack/packaging.py#L527