launchpad
launchpad copied to clipboard
Cannot set certain XLA_ARGS for `PythonProcess`
When using local_mp
, each process that uses jax spawns a huge amount of threads. I'm running 128 actors, and each one spawns ~500 threads, meaning the program spawns over 50,000 threads!
This puts me over the ulimit
for my university cluster, and I suspect isn't performant. The recommended solution is to set XLA_FLAGS="--xla_cpu_multi_thread_eigen=false intra_op_parallelism_threads=1"
. But for some reason this isn't working with PythonProcess
. Here's my PythonProcess for each of my nodes:
PythonProcess(env={
"CUDA_VISIBLE_DEVICES": str(-1),
"XLA_FLAGS": "--xla_cpu_multi_thread_eigen=false intra_op_parallelism_threads=1",
})
Which results in the error bash: line 1: XLA_FLAGS=--xla_cpu_multi_thread_eigen=false intra_op_parallelism_threads=1: command not found
in each process that uses a local resource with those envs. Why is the environment variable being treated as a command here? I've talso ried enclosing the value in quotes which did not work. Thank you!
I've confirmed the problem is the inclusion of spaces.
PythonProcess(env={
"CUDA_VISIBLE_DEVICES": str(-1),
"DUMMY_ARG": "isspace theproblem",
})
errors similarly
For anyone else who wants to set XLA_FLAGS
, I found a workaround solution that involves editing your site_packages
. I'm using the "tmux launcher", (filelaunchpad/launch/run_locally/local_tmux_launcher
) which internally calls the (undocumented) subprocess.list2cmdline
function on a list that looks like ["env1=val1", "env2=val2", "/path/to/python", "command_name.py"]
. Ideally this turns into a command like env1=val1 env2=val2 /path/to/python command_name.py
. But, if there are spaces in any of the env values, then it puts quotes around the key/val: env_1=env1 "env2=spaced value" /path/to/python command_name.py
. This doesn't set the environment variable env2
, but instead tries to run env2=spaced val
as a bash command.
Maybe that's desired behavior by the subprocess.list2cmdline
but it prevents you from setting env variables with spaces in them. So, I just edited it to strip the quotation marks: cmd = cmd.replace('"', "")
. And, used backslash escaping on the spaces inside of the XLA_FLAGS
value.
Would be great to get this fixed or documented as XLA_FLAGS must be a common use case for launchpad!