launchpad icon indicating copy to clipboard operation
launchpad copied to clipboard

Cannot set certain XLA_ARGS for `PythonProcess`

Open samlobel opened this issue 1 year ago • 2 comments

When using local_mp, each process that uses jax spawns a huge amount of threads. I'm running 128 actors, and each one spawns ~500 threads, meaning the program spawns over 50,000 threads!

This puts me over the ulimit for my university cluster, and I suspect isn't performant. The recommended solution is to set XLA_FLAGS="--xla_cpu_multi_thread_eigen=false intra_op_parallelism_threads=1". But for some reason this isn't working with PythonProcess. Here's my PythonProcess for each of my nodes:

      PythonProcess(env={
        "CUDA_VISIBLE_DEVICES": str(-1),
        "XLA_FLAGS": "--xla_cpu_multi_thread_eigen=false intra_op_parallelism_threads=1",
      })

Which results in the error bash: line 1: XLA_FLAGS=--xla_cpu_multi_thread_eigen=false intra_op_parallelism_threads=1: command not found in each process that uses a local resource with those envs. Why is the environment variable being treated as a command here? I've talso ried enclosing the value in quotes which did not work. Thank you!

samlobel avatar Mar 08 '23 19:03 samlobel

I've confirmed the problem is the inclusion of spaces.

      PythonProcess(env={
        "CUDA_VISIBLE_DEVICES": str(-1),
        "DUMMY_ARG": "isspace theproblem",
      })

errors similarly

samlobel avatar Mar 08 '23 19:03 samlobel

For anyone else who wants to set XLA_FLAGS, I found a workaround solution that involves editing your site_packages. I'm using the "tmux launcher", (filelaunchpad/launch/run_locally/local_tmux_launcher) which internally calls the (undocumented) subprocess.list2cmdline function on a list that looks like ["env1=val1", "env2=val2", "/path/to/python", "command_name.py"]. Ideally this turns into a command like env1=val1 env2=val2 /path/to/python command_name.py. But, if there are spaces in any of the env values, then it puts quotes around the key/val: env_1=env1 "env2=spaced value" /path/to/python command_name.py. This doesn't set the environment variable env2, but instead tries to run env2=spaced val as a bash command.

Maybe that's desired behavior by the subprocess.list2cmdline but it prevents you from setting env variables with spaces in them. So, I just edited it to strip the quotation marks: cmd = cmd.replace('"', ""). And, used backslash escaping on the spaces inside of the XLA_FLAGS value.

Would be great to get this fixed or documented as XLA_FLAGS must be a common use case for launchpad!

samlobel avatar Mar 13 '23 12:03 samlobel