pex icon indicating copy to clipboard operation
pex copied to clipboard

/usr/bin/python3: can't find '__main__' module in 'blah.pex'

Open dgkatz opened this issue 4 years ago • 18 comments

I have a pex file which i build via the following command: pex -v -r requirements.txt -c gunicorn -D . -o blah.pex which contructs a pex, installing requirement, and all files in project, setting entrypoint to gunicorn.

Build process works just fine. but when it comes time to run the pex, depending on where i run it ubuntu vm/ mac local/ ubuntu docker sometimes i get the following error: /usr/bin/python3: can't find '__main__' module in 'blah.pex'

When I unzip the pex, i do see a file in there, so im not sure what the problem is.

Has anyone experience this error? Ideas on what the problem is?

dgkatz avatar Apr 17 '20 01:04 dgkatz

Okay, I recently had this problem as well and it turns out that python doesn't know how to handle .pex files that are larger than 2gb in size and that gives you the incredibly useful "Can't find main" error. So, check the size of your pex, if it is >2gb, it's time to go on a dependency diet or try and find other ways to package/deploy your project

tentwelfths avatar Jan 07 '21 18:01 tentwelfths

Sorry to see this so late - thanks for the bump with data @tentwelfths. Perhaps you could try the next Pex release or Pex master which now support a --venv mode. If you build your PEX file using that flag, when run the PEX file will unpack itself into a virtual environment under ~/.pex/venvs and re-execute from there. The upshot is the PEX runs just like any other Python application and size limits on the PEX zip file, etc are all sidestepped.

jsirois avatar Jan 07 '21 18:01 jsirois

Specifically, --venv mode was added in #1153. Going to grab that PR link though reminded me the existing --unzip mode should provide the same remedy in this case. Perhaps you could also or alternatively try that?

jsirois avatar Jan 07 '21 18:01 jsirois

@tentwelfths I repro, although I get a different error message:

$ rm -rf big* && mkdir big && yes "#" | head -n 2000000000 > big/ && echo "import data; print(data.__file__)" > big/ && ls -lh big/
total 3.8G
-rw-r--r-- 1 jsirois jsirois 3.8G Jan  7 11:17
-rw-r--r-- 1 jsirois jsirois   34 Jan  7 11:17
$ pex -D big/ --entry-point exe -obig.pex && ls -lh big.pex
-rwxr-xr-x 1 jsirois jsirois 4.1M Jan  7 11:18 big.pex
$ ./big.pex 
Traceback (most recent call last):
  File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/.bootstrap/pex/", line 487, in execute
  File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/.bootstrap/pex/", line 404, in _wrap_coverage
  File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/.bootstrap/pex/", line 435, in _wrap_profiling
  File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/.bootstrap/pex/", line 543, in _execute
  File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/.bootstrap/pex/", line 645, in execute_entry
  File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/.bootstrap/pex/", line 653, in execute_module
  File "/usr/lib/python3.9/", line 213, in run_module
    return _run_code(code, {}, init_globals, run_name, mod_spec)
  File "/usr/lib/python3.9/", line 87, in _run_code
    exec(code, run_globals)
  File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 982, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 925, in _find_spec
  File "<frozen importlib._bootstrap_external>", line 1349, in find_spec
  File "<frozen importlib._bootstrap_external>", line 1323, in _get_spec
  File "<frozen importlib._bootstrap_external>", line 1304, in _legacy_get_spec
  File "<frozen importlib._bootstrap>", line 423, in spec_from_loader
  File "<frozen importlib._bootstrap_external>", line 656, in spec_from_file_location
  File "<frozen zipimport>", line 191, in get_filename
  File "<frozen zipimport>", line 709, in _get_module_code
  File "<frozen zipimport>", line 560, in _get_data
OSError: zipimport: can't read data

But the --unzip remedy works. Here I simply use the runtime PEX_UNZIP (see pex --help-variables) equivalent instead of rebuilding the PEX file with --unzip. Slow 1st run when the initial unzip happens, fastish after that:

$ time PEX_UNZIP=1 ./big.pex 

real	0m58.322s
user	0m46.461s
sys	0m4.554s
$ time PEX_UNZIP=1 ./big.pex 

real	0m0.584s
user	0m0.448s
sys	0m0.034s

jsirois avatar Jan 07 '21 19:01 jsirois

There is also allowZip64=True: ?

stuhood avatar Jan 08 '21 00:01 stuhood

That's the default for all versions of Python Pex supports save 2.7. IOW building @tentwelfths PEX file would have failed in the 1st place if using 2.7 implying they used 3.x. It seems like an issue not in the zipfile stdlib but in CPython c code that implements zipimport. IE, changing my repro above a bit I get:

$ python2.7 -mpex -D big/ --entry-point exe -obig.pex --venv && ls -lh big.pex
Traceback (most recent call last):
  File "/usr/lib/python2.7/", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/", line 72, in _run_code
    exec code in run_globals
  File "/home/jsirois/dev/pantsbuild/pex/pex/", line 8, in <module>
    __name__ == "__main__" and pex.main()
  File "pex/bin/", line 1048, in main
    deterministic_timestamp=not options.use_system_time,
  File "pex/", line 561, in build, mode="a", deterministic_timestamp=deterministic_timestamp)
  File "pex/", line 666, in zip
  File "pex/", line 646, in write_entry
  File "/usr/lib/python2.7/", line 1257, in writestr
  File "/usr/lib/python2.7/", line 1137, in _writecheck
    " would require ZIP64 extensions")
zipfile.LargeZipFile: Filesize would require ZIP64 extensions

jsirois avatar Jan 08 '21 01:01 jsirois

I'm going to guess the CPython code doesn't re-implement zip but links ziplib and that's where the variance @dgkatz noted comes from. Some versions of ziplib support this and some don't ... that seems likely anyhow.

jsirois avatar Jan 08 '21 01:01 jsirois

Eek - I was living a fantasy. The zipimport lib is written in Python and does not track the zipfile stdlib. - so it just gets this wrong and assumes 32 bit straight up. So this does not explain the @dgkatz works someplaces and not others OP, but we never had clinching evidence that the @tentwelfths issue was exactly the @dgkatz issue anyhow.

jsirois avatar Jan 08 '21 01:01 jsirois

And .. that zipimport shortcoming is documented here:

jsirois avatar Jan 08 '21 01:01 jsirois

I'm trying to put torch in a pex and hitting this problem since torch unzipped is 3GB, and zipped a bit more than 2GB any suggestion here, or any way to split a pex in 2 maybe ?

rom1504 avatar Feb 17 '22 21:02 rom1504

@rom1504 it depends what you want to do with the PEX. Since you'll already have to split it in 2 and have more than 1 file to ship around, perhaps you're ok with --layout packed or --layout loose? If your Pex doesn't have that feature please try using a newer version of Pex, reading the --layout help and giving it a whirl. You'll get a directory instead of a zip though.

There were a few hot fixes after the initial release here: but its been stable for a while now and is used by stable Pants for example for all the internal PEXes it creates to improve caching characteristics.

For example:

$ pex example ansicolors requests -o zipapp.pex
$ ls -lh zipapp.pex 
-rwxr-xr-x 1 jsirois jsirois 1.4M Feb 17 14:13 zipapp.pex
$ ./zipapp.pex -c 'import colors; print(colors.__file__)'

$ pex --layout packed example ansicolors requests -o packed.pex
$ tree -ah packed.pex/
[4.0K]  packed.pex/
├── [409K]  .bootstrap
├── [4.0K]  .deps
│   ├── [ 21K]  ansicolors-1.1.8-py2.py3-none-any.whl
│   ├── [149K]  certifi-2021.10.8-py2.py3-none-any.whl
│   ├── [ 82K]  charset_normalizer-2.0.12-py3-none-any.whl
│   ├── [1.8K]  example-0.1.0-py3-none-any.whl
│   ├── [ 82K]  idna-3.3-py3-none-any.whl
│   ├── [197K]  requests-2.27.1-py2.py3-none-any.whl
│   ├── [ 46K]  six-1.16.0-py2.py3-none-any.whl
│   └── [354K]  urllib3-1.26.8-py2.py3-none-any.whl
├── [2.7K]
└── [1.2K]  PEX-INFO

1 directory, 11 files
$ packed.pex/ -c 'import colors; print(colors.__file__)'
$ python packed.pex/ -c 'import colors; print(colors.__file__)'

jsirois avatar Feb 17 '22 22:02 jsirois

thanks! my use case is I will give that --layout packed mode a try

rom1504 avatar Feb 17 '22 22:02 rom1504

Ok, I have not used Spark before, but I'd guess you want the following config tweaking their example:

export PYSPARK_DRIVER_PYTHON=python  # Do not set in cluster modes.
export PYSPARK_PYTHON=./pyspark_pex_env.pex/
spark-submit --files $(find packed.pex -type f | xargs | tr ' ' ',')

jsirois avatar Feb 17 '22 22:02 jsirois

I confirm the --layout packet mode worked to produce this .pex folder (containing one wheel file per dependency) and I was able to run the file locally with that This files schema should allow me to distribute this in 2 archive, one with the torch dependency and one with everything else, which is convenient. I will next try to use this with pyspark. Will say here if this works as expected but it seems it will.

rom1504 avatar Feb 17 '22 22:02 rom1504

It's indeed working well for my use case, thanks for the suggestion!

It seems this new format could be quite adapted to splitting the building process in several parts, and building each part independently, and maybe even in parallel, speeding up the pex generation, which can take several minutes today. Did you ever consider that option? I guess maybe pants is doing that?

rom1504 avatar Feb 18 '22 17:02 rom1504

@rom1504 Pex does already build in parallel using your number of cores by default (see --help for --jobs). The build process looks like:

  1. Single subprocess: pip download (this performs the resolve and downloads wheels and sdists)
  2. --jobs number of sub-processes: pip wheel (this builds any downloaded sdists into wheels)
  3. --jobs number of sub-processes: pip install (this installs each wheel in its own directory) The only remaining non-parallelized aspect of the PEX build for packed layout is zipping up all the individual wheel install chroots from step 3. That is done in serial:

jsirois avatar Feb 18 '22 17:02 jsirois

@dgkatz - finally looping back. was your issue related to the huge PEX issue @tentwelfths and @rom1504 encountered (>2GB PEX)? If so, I'd like to close this issue since @rom1504 confirms the --layout packed Pex option is a viable workaround.

@tentwelfths hopefully --layout packed gives you an escape hatch too when you simply can't pare down dependencies.

jsirois avatar Feb 18 '22 18:02 jsirois

I just encountered a similar issue w/ a pex file that was ~700M (lower than the 2G reported earlier in this issue). Using --layout packed resolved this for me as well (we also resolved the problem by removing some unnecessary files that were making their way into the original pex file)

rahul-theorem avatar Jun 08 '22 01:06 rahul-theorem

@rahul-theorem there are (at least) 2 axes which can cause a zip to use ZIP64 extensions: size and entry count. If there are >2^16 files in the zip, it will use ZIP64 extensions and won't boot under Python<3.13.

jsirois avatar Aug 14 '24 18:08 jsirois

I'm going to close this as an answered question. In the meantime two things have improved:

  1. Pex now warns when the final PEX zip requires ZIP64 extensions: #2253
  2. Python 3.13 finally handles ZIP64 in zipimporter and can boot these beasts:

jsirois avatar Aug 14 '24 18:08 jsirois