ray
ray copied to clipboard
[Core] Ray 2.47 regression: All tasks hang when using `uv`
What happened + What you expected to happen
Running any Ray jobs just hangs after I upgraded to Ray 2.47. I use uv for environment management, which may be part of the issue judging by the error messages below. I can confirm the script below runs with Ray 2.46.
I have no idea what the issue might be, happy to give more information as needed. This may be related to https://github.com/ray-project/ray/pull/53060 as well.
Please see logs below:
2025-06-16 05:07:01,222 INFO worker.py:1917 -- Started a local Ray instance.
2025-06-16 05:07:01,240 INFO packaging.py:588 -- Creating a file package for local module '/home/coder/Research/Setup/uv_base'.
2025-06-16 05:07:01,263 INFO packaging.py:380 -- Pushing file package 'gcs://_ray_pkg_dbbf75d97baeb29d.zip' (1.63MiB) to Ray cluster...
2025-06-16 05:07:01,272 INFO packaging.py:393 -- Successfully pushed file package 'gcs://_ray_pkg_dbbf75d97baeb29d.zip'.
(raylet) error: unexpected argument '--node-ip-address' found
(raylet)
(raylet) tip: a similar argument exists: '--no-editable'
(raylet)
(raylet) Usage: uv run --with <WITH> --no-editable
(raylet)
(raylet) For more information, try '--help'.
(raylet)
(raylet)
(raylet) Usage: uv run --with <WITH> --no-editable
(raylet)
(raylet)
(raylet)
(raylet) Usage: uv run --with <WITH> --no-editable
(raylet)
(raylet) [2025-06-16 05:08:02,689 E 2371813 2371813] (raylet) worker_pool.cc:586: Some workers of the worker process(2372264) have not registered within the timeout. The process is dead, probably it crashed during start.
(raylet) error: unexpected argument '--node-ip-address' found [repeated 2x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
(raylet) tip: a similar argument exists: '--no-editable' [repeated 2x across cluster]
(raylet) For more information, try '--help'. [repeated 2x across cluster]
(raylet)
(raylet)
(raylet) Usage: uv run --with <WITH> --no-editable
(raylet)
(raylet)
(raylet)
(raylet) Usage: uv run --with <WITH> --no-editable
(raylet)
(raylet)
(raylet)
(raylet) Usage: uv run --with <WITH> --no-editable
(raylet)
Versions / Dependencies
Ray 2.47 Python 3.11 uv 0.7.13
Reproduction script
The script is taken from the docs:
import subprocess
import ray
zen_of_python = subprocess.check_output(["python", "-c", "import this"])
corpus = zen_of_python.split()
num_partitions = 3
chunk = len(corpus) // num_partitions
partitions = [corpus[i * chunk : (i + 1) * chunk] for i in range(num_partitions)]
def map_function(document):
for word in document.lower().split():
yield word, 1
@ray.remote
def apply_map(corpus, num_partitions=3):
map_results = [list() for _ in range(num_partitions)]
for document in corpus:
for result in map_function(document):
first_letter = result[0].decode("utf-8")[0]
word_index = ord(first_letter) % num_partitions
map_results[word_index].append(result)
return map_results
map_results = [
apply_map.options(num_returns=num_partitions).remote(data, num_partitions)
for data in partitions
]
for i in range(num_partitions):
mapper_results = ray.get(map_results[i])
for j, result in enumerate(mapper_results):
print(f"Mapper {i}, return value {j}: {result[:2]}")
Issue Severity
Medium: It is a significant difficulty but I can work around it.
Thanks for reporting this -- can you say more about the exact uv environment you are running in (e.g. are you using a pyproject.toml and if yes, how does it look like).
I've tried to repro your issue on Ray 2.47 on the Ray 2.47 docker image anyscale/ray:2.47.0-py312-cu128 with just uv run and that works for me.
I'm currently working on making the parsing of the uv run command line more robust (https://github.com/ray-project/ray/pull/53838), so if it is related to that I'd love to have a look if that fixes it or if anything more is needed.
+1, I get something similar:
>>> uv run proteus describe ...
/Users/jjahn/Desktop/Exa/monorepo/python/shared/exa_ml/.venv/lib/python3.10/site-packages/fs/__init__.py:4: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
__import__("pkg_resources").declare_namespace(__name__) # type: ignore
2025-06-16 17:13:02,778 INFO worker.py:1908 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8266
2025-06-16 17:13:02,817 INFO packaging.py:588 -- Creating a file package for local module '/Users/jjahn/Desktop/Exa/monorepo/python/shared/exa_ml'.
2025-06-16 17:13:02,857 INFO packaging.py:380 -- Pushing file package 'gcs://_ray_pkg_b181dc551557e89f.zip' (2.49MiB) to Ray cluster...
2025-06-16 17:13:02,860 INFO packaging.py:393 -- Successfully pushed file package 'gcs://_ray_pkg_b181dc551557e89f.zip'.
2025-06-16 17:13:03,998 INFO parquet_datasource.py:226 -- Filtered out 655 paths
(raylet) warning: `VIRTUAL_ENV=/Users/jjahn/Desktop/Exa/monorepo/python/shared/exa_ml/.venv` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead
(raylet) Using CPython 3.10.15
(raylet) Creating virtual environment at: .venv
(raylet) error: Failed to generate package metadata for `dataset==0.1.0 @ directory+../dataset`
(raylet) Caused by: Distribution not found at: file:///private/tmp/ray/session_2025-06-16_17-13-01_243361_52182/runtime_resources/working_dir_files/dataset
(raylet) Caused by: error: Failed to generate package metadata for `dataset==0.1.0 @ directory+../dataset`
(raylet) Distribution not found at: file:///private/tmp/ray/session_2025-06-16_17-13-01_243361_52182/runtime_resources/working_dir_files/dataset Caused by: Distribution not found at: file:///private/tmp/ray/session_2025-06-16_17-13-01_243361_52182/runtime_resources/working_dir_files/dataset
(raylet)
(raylet) error: Failed to generate package metadata for `dataset==0.1.0 @ directory+../dataseterror: Failed to generate package metadata for ``
(raylet) dataset==0.1.0 @ directory+../dataset`
(raylet) Caused by: Distribution not found at: file:///private/tmp/ray/session_2025-06-16_17-13-01_243361_52182/runtime_resources/working_dir_files/dataseterror: Failed to generate package metadata for `dataset==0.1.0 @ directory+../dataset
(raylet) `
(raylet) Caused by: Caused by: Distribution not found at: file:///private/tmp/ray/session_2025-06-16_17-13-01_243361_52182/runtime_resources/working_dir_files/dataset
(raylet) Distribution not found at: file:///private/tmp/ray/session_2025-06-16_17-13-01_243361_52182/runtime_resources/working_dir_files/dataset
(raylet) [2025-06-16 17:14:04,172 E 52203 22370359] (raylet) worker_pool.cc:586: Some workers of the worker process(52223) have not registered within the timeout. The process is dead, probably it crashed during start.
(raylet) warning: `VIRTUAL_ENV=/Users/jjahn/Desktop/Exa/monorepo/python/shared/exa_ml/.venv` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead [repeated 9x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
(raylet) error: Failed to generate package metadata for `dataset==0.1.0 @ directory+../dataset` [repeated 5x across cluster]
(raylet) Caused by: Distribution not found at: file:///private/tmp/ray/session_2025-06-16_17-13-01_243361_52182/runtime_resources/working_dir_files/dataset [repeated 4x across cluster]
Metadata Fetch Progress 0: 0%| | 0.00/21.0 [01:39<?, ? task/s]Metadata Fetch Progress 0: 0%| | 0.00/21.0 [01:40<?, ? task/s]
I am also encountering hanging with Ray 2.47.0 when running in Github Actions on an ubuntu image with a uv venv. Both python 3.12.10 and 3.13.3 affected. Pytest starts but no tests run. No log output to speak of so difficult to know the exact cause; however, pinning Ray to 2.46.0 resolves the issue. Don't know if this is related to uv. Interestingly, I can run the test suite locally with the same setup (uv 0.7.13, python 3.13.3, ray 2.47.0) and it works fine. Quite mysterious...
In case it helps, for context, we have this pytest fixture in our top-level conftest.py which creates and tears down a Ray cluster for use throughout the test suite. Based on the fact that I can see the pytest message stating that it has "collected X tests", but I do not see a single log message with test progress info, I'm guessing it's hanging whilst setting up the fixtures (I could be wrong; I'm no expert on pytest internals).
@pytest.fixture(scope="session")
def ray_ctx() -> _t.Iterator[None]:
"""Initialises and shuts down Ray."""
ray.init(num_cpus=2, num_gpus=0, include_dashboard=False)
yield
ray.shutdown()
I'm using a uv environment in a way that I have a base environment, running the jupyter server (this is at /home/coder/Research/Setup/uv_base), and then I register an ipython kernel for my repo following the uv docs. I then connect to this server from VS Code and select the kernel registered for my repo to run an IPython notebook which in a cell has the code above.
I have a pyproject.toml, which I've been able to reduce to the below:
[project]
name = "test"
version = "0.1.0"
description = "test"
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
"ray[data]==2.47",
]
[dependency-groups]
dev = [
"ipykernel",
]
What is strange is that if I create a different repo with the same pyproject.toml and same directory structure (but at a different location), and register its kernel as well as above, I cannot reproduce the issue. I have no idea how that is possible. I tried to delete the .venv folder in the repo giving me the issue and rebuild it, but that didn't solve the problem. Even more than that, if I select this other, identical kernel registered with this different repo, the file in the original repo runs alright. Mysterious. Happy to supply more logs / info as needed...
Maybe importantly, I get a lot less logs from the working version - the output is just:
2025-06-17 05:12:48,269 INFO worker.py:1917 -- Started a local Ray instance.
Mapper 0, return value 0: [(b'of', 1), (b'is', 1)]
Mapper 0, return value 1: [(b'python,', 1), (b'peters', 1)]
Mapper 0, return value 2: [(b'the', 1), (b'zen', 1)]
Mapper 1, return value 0: [(b'unless', 1), (b'in', 1)]
Mapper 1, return value 1: [(b'although', 1), (b'practicality', 1)]
Mapper 1, return value 2: [(b'beats', 1), (b'errors', 1)]
Mapper 2, return value 0: [(b'is', 1), (b'is', 1)]
Mapper 2, return value 1: [(b'although', 1), (b'a', 1)]
Mapper 2, return value 2: [(b'better', 1), (b'than', 1)]
I had the same issue in Github action after upgrade to Ray 2.47.1 in project with uv.
(raylet) warning: `VIRTUAL_ENV=/home/user/work/project-name/project-name/consumers/module/.venv` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead
(raylet) Using CPython 3.10.17 interpreter at: /opt/hostedtoolcache/Python/3.10.17/x64/bin/python3
(raylet) Creating virtual environment at: .venv
(raylet) Building package-name @ file:///tmp/ray/session_2025-06-18_14-31-34_052948_2068/runtime_resources/working_dir_files/_ray_pkg_3ba2ebeada5159c9
(raylet) Built package-name @ file:///tmp/ray/session_2025-06-18_14-31-34_052948_2068/runtime_resources/working_dir_files/_ray_pkg_3ba2ebeada5159c9
(raylet) Installed 88 packages in 147ms
(raylet) Traceback (most recent call last):
(raylet) File "/home/user/work/project-name/project-name/consumers/module/.venv/lib/python3.10/site-packages/ray/_private/workers/default_worker.py", line 8, in <module>
(raylet) import ray
(raylet) ModuleNotFoundError: No module named 'ray'
(raylet) [2025-06-18 14:32:41,156 E 2151 2151] (raylet) worker_pool.cc:586: Some workers of the worker process(2380) have not registered within the timeout. The process is dead, probably it crashed during start.
(raylet) warning: `VIRTUAL_ENV=/home/user/work/project-name/project-name/consumers/module/.venv` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead
(raylet) Traceback (most recent call last):
(raylet) File "/home/user/work/project-name/project-name/consumers/module/.venv/lib/python3.10/site-packages/ray/_private/workers/default_worker.py", line 8, in <module>
(raylet) import ray
(raylet) ModuleNotFoundError: No module named 'ray'
(raylet) [2025-06-18 14:33:42,151 E 2151 2151] (raylet) worker_pool.cc:586: Some workers of the worker process(2421) have not registered within the timeout. The process is dead, probably it crashed during start.
It creates separate virtual environment instead of using the existing one. As a result we have no ray inside and the worker keeps crashing.
Disabling the uv runtime env propagation with environment variable fixes it:
RAY_ENABLE_UV_RUN_RUNTIME_ENV = 0
Can confirm RAY_ENABLE_UV_RUN_RUNTIME_ENV = 0 works for me as well.
Yep I'm using nix now so no need for uv, it already handles the venvs
Thank you for saving my evening @p1c2u, I was pulling my hair out why this stopped working and thought it was a uv issue.
RAY_ENABLE_UV_RUN_RUNTIME_ENV=0 did not solve this problem for me - ray still creates a new virtualenv.
What worked for me is using double uv run:
uv run uv run --active --no-sync run.py.
The first uv run syncs the requirements and sets up virtualenv path. The second uv run --active forces the environment configured in the first command to be used in all the consecutive calls. --no-sync makes sure that calls within ray do not modify the environment.
I'm using uv==0.7.8 and ray==2.47.1. I've upgraded uv to 0.7.20 and it looks pretty much the same to me.
Just a quick comment to say this seems to be resolved for me with Ray 2.48 once I set the runtime via
ray.init(
runtime_env={
"working_dir": [...],
"excludes": [".git", ".venv", ...],
}
)
Note that I have to exclude .venv as otherwise I get
RuntimeEnvSetupError: Failed to set up runtime environment.
Failed to upload working_dir [...] to the Ray cluster: Package size (638.82MiB) exceeds the maximum size of 512.00MiB. You can exclude large files using the 'excludes' option to the runtime_env or provide a remote URI of a zip file using protocols such as 's3://', 'https://' and so on, refer to https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#api-reference.
I then still get warnings like:
(raylet) warning: `VIRTUAL_ENV=[...]/.venv` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead
(raylet) Using CPython 3.11.0rc1 interpreter at: /usr/bin/python3.11
(raylet) Creating virtual environment at: .venv
(raylet) warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance.
(raylet) If the cache and target directories are on different filesystems, hardlinking may not be supported.
(raylet) If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning.
(raylet) Installed 137 packages in 7.16s
(raylet) warning: `VIRTUAL_ENV=[...]/.venv` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead [repeated 7x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
(raylet) Installed 99 packages in 442ms
(raylet) Installed 99 packages in 490ms
(raylet) Installed 99 packages in 512ms
(raylet) Installed 99 packages in 383ms
(raylet) Installed 99 packages in 622ms
(raylet) Installed 99 packages in 444ms
(raylet) Installed 99 packages in 349ms
(raylet) Installed 99 packages in 509ms
It's slightly annoying having to make sure the Ray working directory is where the pyproject.toml is and specifying excludes (including .venv, to avoid copying unnecessary data), but at least it works... RAY_ENABLE_UV_RUN_RUNTIME_ENV = 0 still looks like the more appealing / easier solution to me.
A further comment from me to say that with ray==2.49.1 and without any customisation / special settings, I get these logs out:
2025-09-18 13:07:23,139 INFO worker.py:1951 -- Started a local Ray instance.
2025-09-18 13:07:23,162 INFO packaging.py:588 -- Creating a file package for local module '/home/coder/Research/Setup/uv_base'.
2025-09-18 13:07:23,190 INFO packaging.py:380 -- Pushing file package 'gcs://_ray_pkg_948115b5ce926b06.zip' (2.78MiB) to Ray cluster...
2025-09-18 13:07:23,200 INFO packaging.py:393 -- Successfully pushed file package 'gcs://_ray_pkg_948115b5ce926b06.zip'.
(raylet) warning: `VIRTUAL_ENV=/home/coder/Research/Models/qis-model-factset/.venv` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead
(raylet) Using CPython 3.11.13
(raylet) Creating virtual environment at: .venv
(raylet) warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance.
(raylet) If the cache and target directories are on different filesystems, hardlinking may not be supported.
(raylet) If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning.
(raylet) Installed 213 packages in 1.26s
Mapper 0, return value 0: [(b'of', 1), (b'is', 1)]
Mapper 0, return value 1: [(b'python,', 1), (b'peters', 1)]
Mapper 0, return value 2: [(b'the', 1), (b'zen', 1)]
Mapper 1, return value 0: [(b'unless', 1), (b'in', 1)]
Mapper 1, return value 1: [(b'although', 1), (b'practicality', 1)]
Mapper 1, return value 2: [(b'beats', 1), (b'errors', 1)]
Mapper 2, return value 0: [(b'is', 1), (b'is', 1)]
Mapper 2, return value 1: [(b'although', 1), (b'a', 1)]
Mapper 2, return value 2: [(b'better', 1), (b'than', 1)]
I.e. it no longer fails, but I wonder if there's a way to pass the --active argument to uv when invoked via ray?