pex
pex copied to clipboard
/usr/bin/python3: can't find '__main__' module in 'blah.pex'
I have a pex file which i build via the following command:
pex -v -r requirements.txt -c gunicorn -D . -o blah.pex
which contructs a pex, installing requirement, and all files in project, setting entrypoint to gunicorn.
Build process works just fine. but when it comes time to run the pex, depending on where i run it ubuntu vm/ mac local/ ubuntu docker sometimes i get the following error:
/usr/bin/python3: can't find '__main__' module in 'blah.pex'
When I unzip the pex, i do see a main.py file in there, so im not sure what the problem is.
Has anyone experience this error? Ideas on what the problem is?
Okay, I recently had this problem as well and it turns out that python doesn't know how to handle .pex files that are larger than 2gb in size and that gives you the incredibly useful "Can't find main" error. So, check the size of your pex, if it is >2gb, it's time to go on a dependency diet or try and find other ways to package/deploy your project
Sorry to see this so late - thanks for the bump with data @tentwelfths. Perhaps you could try the next Pex release or Pex master which now support a --venv
mode. If you build your PEX file using that flag, when run the PEX file will unpack itself into a virtual environment under ~/.pex/venvs
and re-execute from there. The upshot is the PEX runs just like any other Python application and size limits on the PEX zip file, etc are all sidestepped.
Specifically, --venv
mode was added in #1153. Going to grab that PR link though reminded me the existing --unzip
mode should provide the same remedy in this case. Perhaps you could also or alternatively try that?
@tentwelfths I repro, although I get a different error message:
$ rm -rf big* && mkdir big && yes "#" | head -n 2000000000 > big/data.py && echo "import data; print(data.__file__)" > big/exe.py && ls -lh big/
total 3.8G
-rw-r--r-- 1 jsirois jsirois 3.8G Jan 7 11:17 data.py
-rw-r--r-- 1 jsirois jsirois 34 Jan 7 11:17 exe.py
$ pex -D big/ --entry-point exe -obig.pex && ls -lh big.pex
-rwxr-xr-x 1 jsirois jsirois 4.1M Jan 7 11:18 big.pex
$ ./big.pex
Traceback (most recent call last):
File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/.bootstrap/pex/pex.py", line 487, in execute
File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/.bootstrap/pex/pex.py", line 404, in _wrap_coverage
File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/.bootstrap/pex/pex.py", line 435, in _wrap_profiling
File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/.bootstrap/pex/pex.py", line 543, in _execute
File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/.bootstrap/pex/pex.py", line 645, in execute_entry
File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/.bootstrap/pex/pex.py", line 653, in execute_module
File "/usr/lib/python3.9/runpy.py", line 213, in run_module
return _run_code(code, {}, init_globals, run_name, mod_spec)
File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/exe.py", line 1, in <module>
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 982, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 925, in _find_spec
File "<frozen importlib._bootstrap_external>", line 1349, in find_spec
File "<frozen importlib._bootstrap_external>", line 1323, in _get_spec
File "<frozen importlib._bootstrap_external>", line 1304, in _legacy_get_spec
File "<frozen importlib._bootstrap>", line 423, in spec_from_loader
File "<frozen importlib._bootstrap_external>", line 656, in spec_from_file_location
File "<frozen zipimport>", line 191, in get_filename
File "<frozen zipimport>", line 709, in _get_module_code
File "<frozen zipimport>", line 560, in _get_data
OSError: zipimport: can't read data
But the --unzip
remedy works. Here I simply use the runtime PEX_UNZIP
(see pex --help-variables
) equivalent instead of rebuilding the PEX file with --unzip
. Slow 1st run when the initial unzip happens, fastish after that:
$ time PEX_UNZIP=1 ./big.pex
/home/jsirois/.pex/unzipped_pexes/d5e0ee5a82eafd6fe49ccc04bc54f8ec86a7218c/data.py
real 0m58.322s
user 0m46.461s
sys 0m4.554s
$ time PEX_UNZIP=1 ./big.pex
/home/jsirois/.pex/unzipped_pexes/d5e0ee5a82eafd6fe49ccc04bc54f8ec86a7218c/data.py
real 0m0.584s
user 0m0.448s
sys 0m0.034s
There is also allowZip64=True
: https://stackoverflow.com/questions/29830531/python-using-zip64-extensions-when-compressing-large-files ?
That's the default for all versions of Python Pex supports save 2.7. IOW building @tentwelfths PEX file would have failed in the 1st place if using 2.7 implying they used 3.x. It seems like an issue not in the zipfile stdlib but in CPython c code that implements zipimport. IE, changing my repro above a bit I get:
$ python2.7 -mpex -D big/ --entry-point exe -obig.pex --venv && ls -lh big.pex
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/jsirois/dev/pantsbuild/pex/pex/__main__.py", line 8, in <module>
__name__ == "__main__" and pex.main()
File "pex/bin/pex.py", line 1048, in main
deterministic_timestamp=not options.use_system_time,
File "pex/pex_builder.py", line 561, in build
self._chroot.zip(tmp_zip, mode="a", deterministic_timestamp=deterministic_timestamp)
File "pex/common.py", line 666, in zip
write_entry(f)
File "pex/common.py", line 646, in write_entry
zf.writestr(zip_entry.info, zip_entry.data)
File "/usr/lib/python2.7/zipfile.py", line 1257, in writestr
self._writecheck(zinfo)
File "/usr/lib/python2.7/zipfile.py", line 1137, in _writecheck
" would require ZIP64 extensions")
zipfile.LargeZipFile: Filesize would require ZIP64 extensions
I'm going to guess the CPython code doesn't re-implement zip but links ziplib and that's where the variance @dgkatz noted comes from. Some versions of ziplib support this and some don't ... that seems likely anyhow.
Eek - I was living a fantasy. The zipimport lib is written in Python and does not track the zipfile stdlib. https://github.com/python/cpython/blob/3.9/Lib/zipimport.py#L360 - so it just gets this wrong and assumes 32 bit straight up. So this does not explain the @dgkatz works someplaces and not others OP, but we never had clinching evidence that the @tentwelfths issue was exactly the @dgkatz issue anyhow.
And .. that zipimport shortcoming is documented here: https://bugs.python.org/issue32959
I'm trying to put torch in a pex and hitting this problem since torch unzipped is 3GB, and zipped a bit more than 2GB any suggestion here, or any way to split a pex in 2 maybe ?
@rom1504 it depends what you want to do with the PEX. Since you'll already have to split it in 2 and have more than 1 file to ship around, perhaps you're ok with --layout packed
or --layout loose
? If your Pex doesn't have that feature please try using a newer version of Pex, reading the --layout
help and giving it a whirl. You'll get a directory instead of a zip though.
There were a few hot fixes after the initial release here: https://github.com/pantsbuild/pex/releases/tag/v2.1.48 but its been stable for a while now and is used by stable Pants for example for all the internal PEXes it creates to improve caching characteristics.
For example:
$ pex example ansicolors requests -o zipapp.pex
$ ls -lh zipapp.pex
-rwxr-xr-x 1 jsirois jsirois 1.4M Feb 17 14:13 zipapp.pex
$ ./zipapp.pex -c 'import colors; print(colors.__file__)'
/home/jsirois/.pex/installed_wheels/f25c1d6c49102373d349f5f8f1cddc41ce409e15/ansicolors-1.1.8-py2.py3-none-any.whl/colors/__init__.py
$ pex --layout packed example ansicolors requests -o packed.pex
$ tree -ah packed.pex/
[4.0K] packed.pex/
├── [409K] .bootstrap
├── [4.0K] .deps
│ ├── [ 21K] ansicolors-1.1.8-py2.py3-none-any.whl
│ ├── [149K] certifi-2021.10.8-py2.py3-none-any.whl
│ ├── [ 82K] charset_normalizer-2.0.12-py3-none-any.whl
│ ├── [1.8K] example-0.1.0-py3-none-any.whl
│ ├── [ 82K] idna-3.3-py3-none-any.whl
│ ├── [197K] requests-2.27.1-py2.py3-none-any.whl
│ ├── [ 46K] six-1.16.0-py2.py3-none-any.whl
│ └── [354K] urllib3-1.26.8-py2.py3-none-any.whl
├── [2.7K] __main__.py
└── [1.2K] PEX-INFO
1 directory, 11 files
$ packed.pex/__main__.py -c 'import colors; print(colors.__file__)'
/home/jsirois/.pex/installed_wheels/f25c1d6c49102373d349f5f8f1cddc41ce409e15/ansicolors-1.1.8-py2.py3-none-any.whl/colors/__init__.py
$ python packed.pex/ -c 'import colors; print(colors.__file__)'
/home/jsirois/.pex/installed_wheels/f25c1d6c49102373d349f5f8f1cddc41ce409e15/ansicolors-1.1.8-py2.py3-none-any.whl/colors/__init__.py
thanks!
my use case is https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html#using-pex
I will give that --layout packed
mode a try
Ok, I have not used Spark before, but I'd guess you want the following config tweaking their example:
export PYSPARK_DRIVER_PYTHON=python # Do not set in cluster modes.
export PYSPARK_PYTHON=./pyspark_pex_env.pex/__main__.py
spark-submit --files $(find packed.pex -type f | xargs | tr ' ' ',') app.py
I confirm the --layout packet mode worked to produce this .pex folder (containing one wheel file per dependency) and I was able to run the __main__.py
file locally with that
This files schema should allow me to distribute this in 2 archive, one with the torch dependency and one with everything else, which is convenient.
I will next try to use this with pyspark. Will say here if this works as expected but it seems it will.
It's indeed working well for my use case, thanks for the suggestion!
It seems this new format could be quite adapted to splitting the building process in several parts, and building each part independently, and maybe even in parallel, speeding up the pex generation, which can take several minutes today. Did you ever consider that option? I guess maybe pants is doing that?
@rom1504 Pex does already build in parallel using your number of cores by default (see --help
for --jobs
). The build process looks like:
- Single subprocess:
pip download
(this performs the resolve and downloads wheels and sdists) - --jobs number of sub-processes:
pip wheel
(this builds any downloaded sdists into wheels) - --jobs number of sub-processes:
pip install
(this installs each wheel in its own directory) The only remaining non-parallelized aspect of the PEX build for packed layout is zipping up all the individual wheel install chroots from step 3. That is done in serial: https://github.com/pantsbuild/pex/blob/7a6e9a46c7e4fc67c6d3f1a0fc19d5b204d5ee81/pex/pex_builder.py#L696-L719
@dgkatz - finally looping back. was your issue related to the huge PEX issue @tentwelfths and @rom1504 encountered (>2GB PEX)? If so, I'd like to close this issue since @rom1504 confirms the --layout packed
Pex option is a viable workaround.
@tentwelfths hopefully --layout packed
gives you an escape hatch too when you simply can't pare down dependencies.
I just encountered a similar issue w/ a pex file that was ~700M (lower than the 2G reported earlier in this issue). Using --layout packed
resolved this for me as well (we also resolved the problem by removing some unnecessary files that were making their way into the original pex file)
@rahul-theorem there are (at least) 2 axes which can cause a zip to use ZIP64 extensions: size and entry count. If there are >2^16 files in the zip, it will use ZIP64 extensions and won't boot under Python<3.13.
I'm going to close this as an answered question. In the meantime two things have improved:
- Pex now warns when the final PEX zip requires ZIP64 extensions: #2253
- Python 3.13 finally handles ZIP64 in zipimporter and can boot these beasts: https://docs.python.org/3.13/whatsnew/3.13.html#zipimport