pex Allow to set multiple folders for PEX

It will be helpful to set multiple folders for PEX_ROOT in case one folder is not writable due to permission or disk full.

Oct 24 '22 12:10 manuzhang

Can you explain a bit more? Do you have situations where building the PEX file with --runtime-pex-root X is not enough? If so, can you provide more details about your deployment targets that lead to this need?

Oct 24 '22 13:10 jsirois

On our cluster, we usually have two disks on a node and we won't decommission the node if only one disk is full or unhealthy. Say the two disks are mounted at /disk1 and /disk2, it will be great if we can set PEX_ROOT to /disk1,/disk2 and pex will find the first writable folder.

Oct 24 '22 15:10 manuzhang

Aha, ok. So the PEX runtime tries os.mkdir on /disk1 which nets an OSError (or IOError) due to a disk full condition and then falls back to /disk2. That makes sense to me. This is all self contained in pex.pex_info.PexInfo class except for the --runtime-pex-root plumbing which is in pex/bin/pex.py.

@manuzhang are you willing to contribute this feature?

Oct 24 '22 15:10 jsirois

@jsirois thanks, I've submitted https://github.com/pantsbuild/pex/pull/1965. Please help approve to run workflow and review.

Oct 25 '22 02:10 manuzhang

@jsirois Our usage is for PySpark applications(https://spark.apache.org/docs/3.1.1/api/python/user_guide/python_packaging.html#using-pex) and pex root is set like spark.executorEnv.PEX_ROOT=/disk1/pex. /disk1 is shared by many applications besides pex and can become full. For other applications, the node (one of thousands) is managed by YARN, which has a health checker monitoring the disk state. It's configured with two disks and won't assign application to the node when both disks are unhealthy.

I'm thinking about adding similar feature to pex.

Oct 26 '22 03:10 manuzhang

OK, thanks for the background. The health checker approach is the right approach! You need a daemon or a cron job for that. Pex is neither, it's a one-shot job. Its very hard to solve this with a one-shot job is what I'm trying to point out. That's not the right layer to solve this problem. The best way I can think of to solve this problem for a one shot job is to make sure it uses a wrapper that does something like (psuedo code - this could be a bash wrapper or even a python wrapper):

pex_roots = [...]
for pex_root in pex_roots:
  export PEX_ROOT=pex_root
  try:
    sys.exit(run_job())
  except Exception as e:
    if not is_disk_full_condition(e):
      raise e
sys.exit(f"None of the configured pex_roots are useable!: {pex_roots}")

I'd suggest the is_disk_full_condition is not an easy test to write on top of that. Perhaps there is a specific os string you can grep for in the exception that does indicate a disk full vs some other condition? You definitely don't want to be retrying a PEX when it fails for other reasons.

Oct 26 '22 03:10 jsirois

Maybe this makes things clearer. Pex can restart itself if needed since it knows what its constraints are. It cannot, however restart user code - that might be disastrous. Who knows what that user code does. Maybe it withdraws money from a bank - poorly. As such, the best you can ever hope for here is that some form of new code / logic in Pex ensures user code never runs unless the PEX boot has probed all possible pex roots to find one it can use to get fully booted. Now, the handoff happens to user code and it tries to write some tempfile and hits disk full and everything crashes anyhow, outside the control of the PEX boot code. It really does seem to me that if your environment runs into problems like this frequently enough to want to solve the issue, you need to do it at ahigher system layer.

Oct 26 '22 04:10 jsirois

@jsirois thanks for everything you've said. The error pex would throw is like this

dagster.pex/.bootstrap/pex/common.py", line 221, in _extract_member
    File "/usr/share/anaconda3/python3.7/lib/python3.7/zipfile.py", line 1690, in _extract_member
      open(targetpath, "wb") as target:
  OSError: [Errno 28] No space left on device:

The higher layer, in our case, is Spark which has no idea what's going on at the Python side and YARN which handles disk health but doesn't integrate with pex. Another idea is that pex can randomly choose from the list of PEX_ROOT, and Spark could retry task (restart pex) on failure and eventually we can hit the healthy disk.

Oct 26 '22 04:10 manuzhang

Ok, well hopefully you've taken in everything to consider and can come up with a reasonable design. I'm not seeing that design myself.

Make sure you address at least these questions:

Why its not feasible to implement a wrapper that does the retrying as sketched above.
Pex runs on MacOS and Linux (and soon Windows) and Python 2.7 through 3.11 - how does Pex detect disk full robustly on all these systems. It seems like you have a way to detect this on Linux. Perhaps Errno 28 is https://docs.python.org/3/library/errno.html#errno.ENOSPC. The trick is making sure you have a solid means of detection for all Pex use cases, not just yours.

Pex uses PexInfo.pex_root in several places - how will you address detecting disk full robustly in all these places?:

$ git grep "\.pex_root" pex/
pex/bin/pex.py:    pex_info.pex_root = options.runtime_pex_root
pex/bin/pex.py:        pex_root = pex_info.pex_root
pex/commands/command.py:                if options.pex_root and options.cache_dir != options.pex_root:
pex/commands/command.py:                elif options.pex_root:
pex/commands/command.py:                pex_root = options.cache_dir or options.pex_root or ENV.PEX_ROOT
pex/environment.py:            pex_root = pex_info.pex_root
pex/pex_bootstrapper.py:                short_venv_dir = os.path.join(pex_info.pex_root, "venvs", "s", entropy)
pex/pex_bootstrapper.py:                                pex_root=pex_info.pex_root,
pex/pex_bootstrapper.py:    with ENV.patch(PEX_ROOT=pex_info.pex_root):
pex/pex_builder.py:            os.path.join(pex_info.pex_root, "bootstrap_zips", bootstrap_hash)
pex/pex_builder.py:                    os.path.join(pex_info.pex_root, "installed_wheel_zips", fingerprint)
pex/pex_info.py:        return self._venv_dir(self.pex_root, pex_file, interpreter)
pex/pex_info.py:        return os.path.join(self.pex_root, self.BOOTSTRAP_CACHE, self.bootstrap_hash)
pex/pex_info.py:        return os.path.join(self.pex_root, self.INSTALL_CACHE)
pex/pex_info.py:        return os.path.join(self.pex_root, "user_code")
pex/tools/commands/venv.py:                safe_rmtree(pex.pex_info().pex_root)

If you don't want to tackle 3 above but instead just try to detect disk full once in the PexInfo.pex_root property like the perm check is done today, how will you test for disk full and will that check be performant enough that it can be done for all Pex use cases without ~any performance impact? Remember that in your case you may be willing to burn 50ms to determne the correct pex_root to use, but other users will not be happy about this perf regression.

Oct 26 '22 07:10 jsirois

@jsirois thanks again for the thorough reply. I understand now it's not feasible to detect disk full in pex. Hence, I'm not going down that road anymore.

Spark runs Python code as a blackbox so it has no idea what's running inside. It could be a virtual environment like pex, anaconda or Python lib on the system. Hence, my last idea is a wrapper at higher level.

Another idea is that pex can randomly choose from the list of PEX_ROOT, and Spark could retry task (restart pex) on failure and eventually we can hit the healthy disk.

Oct 26 '22 09:10 manuzhang

@manuzhang it sounds like you accepted solving the disk full condition with a higher level wrapper. As such, I'll close this issue and the associated PR.

Aug 17 '24 03:08 jsirois

pex
pex copied to clipboard

Allow to set multiple folders for PEX_ROOT

pex pex copied to clipboard

Allow to set multiple folders for PEX_ROOT

pex
pex copied to clipboard