[Feedback] Feedback for ray + uv
Hello everyone! As of Ray 2.43.0, we have launched a new integration with uv run that we are super excited to share with you all. This will serve as the main Github issue to track any issues or feedback that ya'll might have while using this.
Please share any success stories, configs, or just cool discoveries that you might have while running uv + Ray! We are excited to hear from you.
To read more about uv + Ray, check out our new blog post here.
Hey ya'll! Would be great to have some more formal docs or guide to get this working besides the blog post. I don't know how to use our current JobConfig + anyscale.job.submit workflow with this new method
@cabreraalex Thanks for your feedback, I'm currently working on the anyscale.job.submit workflow and will update here after that's deployed. And yes you are right, we also need to work on more formal docs 👍
Hi. The ray docker image does not include the uv binary, what blocks using the new feature in containerized setup.
docker run --rm -it rayproject/ray:2.43.0 sh
$ uv
sh: 1: uv: not found
@cabreraalex In the latest release 0.26.4 of the anyscale CLI (https://pypi.org/project/anyscale/), the py_executable support is now implemented for JobConfig and the job submit workflow. You need a cluster image that has UV installed and also unsets RAY_RUNTIME_ENV_HOOK (that's a wrinkle we'd like to remove going forward), for example like this
FROM anyscale/ray:2.43.0-slim-py312-cu125
RUN curl -LsSf https://astral.sh/uv/install.sh | sh
RUN echo "unset RAY_RUNTIME_ENV_HOOK" >> /home/ray/.bashrc
and then you can e.g. use it like the following -- create a working_dir with the following files:
main.py
import ray
@ray.remote
def f():
import emoji
return emoji.emojize("Python rocks :thumbs_up:")
print(ray.get(f.remote()))
pyproject.toml
[project]
name = "test"
version = "0.1"
dependencies = ["emoji", "ray"]
job.yaml
name: test-uv
image_uri: <your image here>
working_dir: .
py_executable: "uv run"
entrypoint: uv run main.py
# If there is an error, do not retry.
max_retries: 0
And submit your job with anyscale job submit -f job.yaml. Instead of using a yaml you can also submit it via the SDK like
import anyscale
from anyscale.job.models import JobConfig
config = JobConfig(
name="my-job",
entrypoint="uv run main.py",
working_dir=".",
max_retries=0,
image_uri="<your image here>",
py_executable="uv run",
)
anyscale.job.submit(config)
Fantastic, will test it out, thanks!
Just found out this ticket. I opened two tickets which related with uv
- https://github.com/ray-project/ray/issues/51196
- https://github.com/ray-project/ray/issues/51195
I'm also getting a uv: not found in the raylet logs of a ray cluster I started on EC2. Do I need to follow the same steps (unset RAY_RUNTIME_ENV_HOOK + install uv)? Do I need to do this on a) the head node, b) the worker nodes, or both? I'm using the rayproject/ray:2.43.0-py312-cpu image.
I have some issues using uv on a remote cluster:
#51368
currently in the process of trying to use this. I have a ingress that handles multiple downstream models, with quite a few deps. When bringing up the cluster I use:
setup_commands:
- sudo apt-get update -y && sudo apt install -y espeak-ng espeak-ng-data libespeak-ng1 libpcaudio0 libportaudio2 libpq-dev
- curl -LsSf https://astral.sh/uv/install.sh | sh # Install uv
- echo 'export RAY_RUNTIME_ENV_HOOK=ray._private.runtime_env.uv_runtime_env_hook.hook' >> ~/.bashrc
- pip install ray[all]==2.43.0
# Command to start ray on the head node.
head_start_ray_commands:
- ray stop
- >-
RAY_health_check_initial_delay_ms=999999999999999999
ray start
--head
--port=6379
--object-manager-port=8076
--autoscaling-config=~/ray_bootstrap_config.yaml
# Command to start ray on worker nodes.
worker_start_ray_commands:
- ray stop
- >-
RAY_health_check_initial_delay_ms=999999999999999999
ray start
--address=$RAY_HEAD_IP:6379
--object-manager-port=8076
and I start serve with uv run --verbose serve run deployments.ingress:router
Unfortunately it seems like it spends too much time redownloading / building deps, and ray eventually just decides to restart the raylet (causing an endless loop) or force-kills the worker causing it to crash.
The RAY_health_check_initial_delay_ms=999999999999999999 was an attempt around that, but no luck so far.
Thanks for writing such an awesome feature!
~~I'm giving it a try in an application with cuda/torch dependencies and for 2 workers, the application starts in around 1-2minutes, but when I scale to 8 workers, it takes much longer and I see a lot of these messages:~~
(raylet, ip=W.X.Y.Z) [2025-03-16 19:32:22,168 E 4106941 4106941] (raylet) worker_pool.cc:581: Some workers of the worker process(4114829) have not registered within the timeout. The process is still alive, probably it's hanging during start.
~~Is there any way to debug what's going on or why it's taking so long for the other processes to start? Is it possible that the cache isn't working as expected?~~ (edit: turns out I just needed to propagate UV_CACHE_DIR to all the workers)
Also, as a general feedback, could the ray logger log the dependencies all the workers ended up using assuming back on the driver assuming they can each run with a different py_executable?
Would it be possible to show the output of uv when submitting a job? Currently I'm just seeing:
uv run cli.py cluster show_deps
2025-03-17 11:28:33,592 INFO dashboard_sdk.py:338 -- Uploading package gcs://_ray_pkg_48f447cb6ecbfe4b.zip.
2025-03-17 11:28:33,594 INFO packaging.py:575 -- Creating a file package for local module '.'.
Job submitted with ID show_deps--2025-03-17_11-28-33
Job submission server address: http://localhost:8265
and it's been stuck like that for a while; not sure if it's doing anything or if it's working on installing dependencies.
EDIT: it was taking a long time because I tried to be fancy and set the UV_CACHE_DIR to an EFS dir, which was really slow. I removed that and it was much faster
I've also noticed that when working_dir is large due to random untracked files in my python project repo (close to 100MiB limit) the uv pip install of each worker can be very slow (presumably it's copying for the install?). I can always add these files to .gitignore to bring the size down, but this seems cumbersome. Could there be a feature to ensure that only the tracked files get uploaded? Or perhaps configure the max working_dir size to something much smaller to allow library maintainers to clamp down on these large uploads to give helpful error messages to the user?
@pcmoritz Hm it seems like it's re-installing all dependencies each time a worker is invoked. Is there some way to avoid that? I'm also getting errors that its failing, like:
(raylet, ip=172.31.81.185) errorerror: : failed to remove directory `/tmp/ray/session_2025-03-18_01-47-01_879621_311/runtime_resources/working_dir_files/_ray_pkg_686cd24b492b1031/.venv/lib/python3.12/site-packages/transformers/utils`: No such file or directory (os error 2)failed to remove directory `/tmp/ray/session_2025-03-18_01-47-01_879621_311/runtime_resources/working_dir_files/_ray_pkg_686cd24b492b1031/.venv/lib/python3.12/site-packages/transformers/utils`: No such file or directory (os error 2)
(raylet, ip=172.31.81.185) Uninstalled 1 package in 137ms [repeated 3x across cluster]
(raylet, ip=172.31.81.185) Installed 30 packages in 3.41s [repeated 4x across cluster]
EDIT: I added "env_vars": { "UV_PROJECT_ENVIRONMENT": "/home/ray/uv_shared" } to my runtime_env, which should make all of the actors use the same venv. It seems like it's working better but not sure it's the right solution.
When using the following
runtime_env = {
'uv': 'requirements.txt',
}
run.init('address', runtime_env=runtime_env)
I get this error from an overly long file name
error: failed to read from file
[/tmp/ray/session_2025-03-19_12-12-38_586279_1/runtime_resources/uv/b4087477499d4d98f60ddd904f5146a19992f52e/exec_cwd/eyJ2ZXIiOjEsImlzdSI6MTc0MjQxMTg0NywiZW5jIjoiQTEyOEdDTSIsInRhZyI6ImJlWnZVWHI2RGFISXRhV3J2b2JlQmciLCJleHAiOjE3NDI0NTUwNDcsImFsZyI6IkExMjhHQ01LVyIsIml2IjoiSlVNTmxaemw1bkN0UklsZyJ9.vZ8egZQlA_pVoXh_Sjh4eQ.ZQdf-SopXIJ4q6Fe.Mu6xPuPr-PG21tdRV8aLNcspyWv0gtiH-dGhmUiKqaGtY5WzXr9Qs4bB5wyDK9b_bkT7LlZopfs8eeli15VXF_vgJ_WVbIfwdpKW-lJ4YUB2yrYCG9TTVXi5aAZgCSQe5H7tt1AZX1la-DWhNc3XEpSO-QSwwkZNnl70oVCex8W3nkWgXBkQkcQq4lhxvDJFBFupjZ9gLOr-Q4aY915RSTQGZzAtvkMIk77S7v1s3Omepb6N2wKZ2w95JsBG3wniHNrLp9zadWLxclWAQXlTAkLFMtmIEtLbqdKODL4X7Df5FJRIJo2Q8E5grrtgGSbF-awXkEzjD_7YB6-AYZ4s6zWcYQr5ckJXWufmsP1zURF8LLPZksN0THUiGZq2SXO_qXN1ebg_7o_IkOEmh2msxmvOE90XPJWcqvlWEBiwmeElEgwsZ_Qj34p8Onqo-Y_vWN4ZXmyzzmFX-lcYHxaYL1YRat4xexKPXUduG106cnEvxuH-FEwD9vHTnW4-F0_lX-45KOenCbCL9x8NCdrSTpssmfi7SmUs-wO6MHPqyE4CvVwXUtufcepP01CiHv6vfetC0EsOmMUsV77KOWbYL5qu56mrAoPMecN27VlQtLT31FZKyKLHiGM5ng1wtk6vNKKGtQ6azDEy1eQBaPKwMjf2xkwxyRCZhZ8CMNnyj94V-Q94pwdAjX5RYyVpV4nRVttrT492E3c_nyfabMOXDWuIWZlh5TS1fQWDAdQGFa6JHhgYv8WDdMpstorxOfj9VMtuWT-tOLlMa2Ai1Mmvv3UaPL3bG9W9m7nxU8a6HBR9Olv2IVQsiMt1RJ-JvTIHWFMkdpWMU2L0XMN4Pln-Bs6qxlpcbjge0jl6-VgwNFYIil4mpEGizNN6Y_a_u6vz-FWbQbhBKa7rWY_6ZE7IS2zpt1nKjMmnKC_Dt1x1w9Bu1Y_KF6g32KhS-a1a-FfnZncL8UFOlUQ3ONF8IPy_pcme_gP5MsjHWL2xUIrT0Hoc9Fw0NOEePVg052uvnpYCxQ8mmYig12mNRED7B6-CMCtFcBOIGgrwLc64LsUm3AeDh4VCJmNPbaags_xyEfGFObOn93a66FX7A4LFi59E0O6uzvydcWuiWdxU7V_EyvCDHaYZqvNqi88T03iPkVmHo-G4Up-4Zg8SkVvoUnCqWlyqCphTVP3QMeQfkPe2PYfPJzRHziNmlA7Fo-ztlCilhJ0d-LxK1i8xNFjf4jr7iSyqUyteFByGcWvBWu4pGE9pLWdiibeN97-STtL709ew9xH8k3j0AtbdWOfs5AYAn-Kz0kW4_0zXHt9GksyBRPVMaMF_I02BKA4.VsHV98LbeBm6HkzYQSC3uQ/apprise/index.html](http://127.0.0.1:3999/tmp/ray/session_2025-03-19_12-12-38_586279_1/runtime_resources/uv/b4087477499d4d98f60ddd904f5146a19992f52e/exec_cwd/eyJ2ZXIiOjEsImlzdSI6MTc0MjQxMTg0NywiZW5jIjoiQTEyOEdDTSIsInRhZyI6ImJlWnZVWHI2RGFISXRhV3J2b2JlQmciLCJleHAiOjE3NDI0NTUwNDcsImFsZyI6IkExMjhHQ01LVyIsIml2IjoiSlVNTmxaemw1bkN0UklsZyJ9.vZ8egZQlA_pVoXh_Sjh4eQ.ZQdf-SopXIJ4q6Fe.Mu6xPuPr-PG21tdRV8aLNcspyWv0gtiH-dGhmUiKqaGtY5WzXr9Qs4bB5wyDK9b_bkT7LlZopfs8eeli15VXF_vgJ_WVbIfwdpKW-lJ4YUB2yrYCG9TTVXi5aAZgCSQe5H7tt1AZX1la-DWhNc3XEpSO-QSwwkZNnl70oVCex8W3nkWgXBkQkcQq4lhxvDJFBFupjZ9gLOr-Q4aY915RSTQGZzAtvkMIk77S7v1s3Omepb6N2wKZ2w95JsBG3wniHNrLp9zadWLxclWAQXlTAkLFMtmIEtLbqdKODL4X7Df5FJRIJo2Q8E5grrtgGSbF-awXkEzjD_7YB6-AYZ4s6zWcYQr5ckJXWufmsP1zURF8LLPZksN0THUiGZq2SXO_qXN1ebg_7o_IkOEmh2msxmvOE90XPJWcqvlWEBiwmeElEgwsZ_Qj34p8Onqo-Y_vWN4ZXmyzzmFX-lcYHxaYL1YRat4xexKPXUduG106cnEvxuH-FEwD9vHTnW4-F0_lX-45KOenCbCL9x8NCdrSTpssmfi7SmUs-wO6MHPqyE4CvVwXUtufcepP01CiHv6vfetC0EsOmMUsV77KOWbYL5qu56mrAoPMecN27VlQtLT31FZKyKLHiGM5ng1wtk6vNKKGtQ6azDEy1eQBaPKwMjf2xkwxyRCZhZ8CMNnyj94V-Q94pwdAjX5RYyVpV4nRVttrT492E3c_nyfabMOXDWuIWZlh5TS1fQWDAdQGFa6JHhgYv8WDdMpstorxOfj9VMtuWT-tOLlMa2Ai1Mmvv3UaPL3bG9W9m7nxU8a6HBR9Olv2IVQsiMt1RJ-JvTIHWFMkdpWMU2L0XMN4Pln-Bs6qxlpcbjge0jl6-VgwNFYIil4mpEGizNN6Y_a_u6vz-FWbQbhBKa7rWY_6ZE7IS2zpt1nKjMmnKC_Dt1x1w9Bu1Y_KF6g32KhS-a1a-FfnZncL8UFOlUQ3ONF8IPy_pcme_gP5MsjHWL2xUIrT0Hoc9Fw0NOEePVg052uvnpYCxQ8mmYig12mNRED7B6-CMCtFcBOIGgrwLc64LsUm3AeDh4VCJmNPbaags_xyEfGFObOn93a66FX7A4LFi59E0O6uzvydcWuiWdxU7V_EyvCDHaYZqvNqi88T03iPkVmHo-G4Up-4Zg8SkVvoUnCqWlyqCphTVP3QMeQfkPe2PYfPJzRHziNmlA7Fo-ztlCilhJ0d-LxK1i8xNFjf4jr7iSyqUyteFByGcWvBWu4pGE9pLWdiibeN97-STtL709ew9xH8k3j0AtbdWOfs5AYAn-Kz0kW4_0zXHt9GksyBRPVMaMF_I02BKA4.VsHV98LbeBm6HkzYQSC3uQ/apprise/index.html): File name too long (os error 36)
Hi everyone! Thanks for your feedback. We're going to start going through the issues listed.
In the meantime, I've created a uv Github issues tag to help triage uv specific issues. If you all decide to open a Github issue, I can help tag them correctly with uv. Thank you!
Opened an issue asking for uv in anyscale/ray Docker images.
https://github.com/ray-project/ray/issues/51592
We just released https://github.com/ray-project/ray/releases/tag/ray-2.44.0, which makes it possible to use the UV hook with Job submission (https://github.com/ray-project/ray/pull/51150), so you don't need to fall back to py_executable for job submission any more :)
In the latest master, uv run can now be used with the Ray Client too: https://github.com/ray-project/ray/pull/51683 :)
Hello... I opened an issue reporting an error on running ray job submit #51777
https://github.com/ray-project/kuberay/issues/3247
A community member opened this ticket for uv + Kuberay + Ray and am cross posting here for visibility since it is in a different repo.
I don't think is specific to the new uv feature, but I am facing ConnectionAbortedError due to timeouts when synchronizing a large uv environment of ~200 packages. I don't see any easy way to increase the timeout for runtime environment initialization.
runtime_env = {
'working_dir': '...', # ~5MB, push takes ~1s
'py_executable': 'uv run',
'excludes': [...],
}
ray.init(..., runtime_env=runtime_env)
Not sure others are running into this, but I had poor performance using uv with many actors. I filed an issue against uv since I managed to repro in standalone example.
In my use-case the number of files needing to be hardlinked or symlinked in the dynamically created uv environment was so numerous that it takes many minutes to start up. Even for my example of 4 popular top level libraries, I still incur a large setup time (my actual use-case would have more dependencies).
Hello! Thanks for the great work. I've just found out uv and ray and was super excited about the nice integration.
I'm trying out a rare combination of uv + ray across Windows and Linux nodes. While multi-node Windows cluster doesn't seem to be officially supported, it worked well for the most part. Except when running RAY_RUNTIME_ENV_HOOK=ray._private.runtime_env.uv_runtime_env_hook.hook uv run test.py (test.py from official example) on a Linux machine and the job gets scheduled on the Windows machine, I get this error:
'"uv run"' �����ڲ����ⲿ���Ҳ���ǿ����еij���
The latter part seems to be garbled UTF-8 characters in my native language, but I guess it says
"uv run" is not recognized as an internal or external command, operable program or batch file.
This suggests that "uv run" is being handled as a whole program name rather than two parts, program "uv" and argument "run".
Running test.py without uv_runtime_env_hook works as expected: emoji package not found until I add it manually on my Windows worker. I'm not sure if this is a supported use case but I thought it's worth mentioning.
Ray version 2.46.0, Python version 3.11.11.
Adding some notes for those venturing here:
- Using
uvwith highly distributed applications naturally slams the index much faster thanpipwould. Though we haven't run into any issues on PyPi, be prepared to protect your self-hosted indices or pull-through caches. We consistently429ed therospiindices until we resolved to our own index. - Using
uvwith many dependencies, especially those which build at install time, will require increasing the various worker timeout settings. Without these increases, you will probably run into not-quite-infinite autoscaling thrashing. - Compiling bytecode in deployed applications and containers is usually a good idea. Ray duplicates the environment for each worker. Using Ray with
uvmakes bytecode compilation go brr, but sometimes too brr for its own good. Some documentation and sane defaults in the Ray containers for this would be great! - The
uv runas apy_executableinterface is super flexible - let's you easily configuregroupsand evenprojects/workspaces.- Considering these features are rather stable, I hope to see them integrated directly into the dependency management API. If they are, users will find it much easier to set up actor-specific dependencies, and perhaps even multi-workspace disjoint sets of dependencies, all with the magic and speed of
uv.
- Considering these features are rather stable, I hope to see them integrated directly into the dependency management API. If they are, users will find it much easier to set up actor-specific dependencies, and perhaps even multi-workspace disjoint sets of dependencies, all with the magic and speed of
- A documented example of how to efficiently use
cache-keysalong withcache-dirto responsibly maintain a cluster-, job-, or cloud-level cache for both OSS Ray deployments and Anyscale service deployments may be good for users to see. uv runsyncs deps from theuv.lockfile. Most reproducible environments have these. The Anyscale Ray Turbo / other magic injections are not ergonomic to this process.- We spent multiple hours initially debugging why jobs simply wouldn't start successfully. Parsing through the thousands of lines of
uv synclogs (queue[repeated 75x across cluster]slow-mo-reel) revealed that our locked dependencies were incompatible with the Anyscale "magically injected" ones, and theuv runcall was, expectedly, overwriting them. - Though making the Anyscale version of the
raypython package available in a Anyscale Cloud locked PyPi index could resolve some of this, it would still prevent users from maintaining auv.lockfiles since the proprietary wheel would not be available in the developer's usual environment. - Instead, I'd suggest adding a
ray[anyscale]extra to the public distribution which simply resolves the dependencies. Then, using the various internal hook available, Anyscale can inject the actual code in the Anyscale Cloud environment at firstrayimport, for example.
- We spent multiple hours initially debugging why jobs simply wouldn't start successfully. Parsing through the thousands of lines of
- Speaking of log pollution, perhaps
quietshould be another sane default? Or at leastno-progress? uvdoesn't work in Anyscale Workspaces just yet - heads up - but I hear it's coming soon!
Sorry for the wall of text - hopefully some small snippet of this is helpful to at least one person! 😅
I encountered an incredibly strange Windows issue with this. Still not sure exactly what is happening. I have a Hello World ray script in a UV environment.
In Powershell, I can run uv run main.py, and I will get:
PS C:\Users\jonat\Files\dummy> uv run main.py
2025-07-23 13:11:16,098 INFO worker.py:1927 -- Started a local Ray instance.
Hello World
In Command Prompt I can run uv run main.py, and I will get:
C:\Users\jonat\Files\dummy>uv run main.py
2025-07-23 13:12:12,345 INFO worker.py:1927 -- Started a local Ray instance.
2025-07-23 13:12:12,406 INFO packaging.py:588 -- Creating a file package for local module 'C:\Users\jonat\Files\dummy'.
2025-07-23 13:12:12,442 INFO packaging.py:380 -- Pushing file package 'gcs://_ray_pkg_41600691f764270d.zip' (0.32MiB) to Ray cluster...
2025-07-23 13:12:12,444 INFO packaging.py:393 -- Successfully pushed file package 'gcs://_ray_pkg_41600691f764270d.zip'.
(raylet) '"uv run"' is not recognized as an internal or external command,
(raylet) operable program or batch file.
Unsure why this would matter. I figured it might be a path thing but I tried adding UV to my system PATH more explicitly, and it didn't seem to make a difference. Hopefully this helps anyone else that stumbles into this. (Also it's worth noting that if you're working in a Jupyter Notebook/Lab, it matters which shell started the Notebook/Lab)
I was not able to to run uv + ray remotely only by using ray.init(). Is there a way how to install all dependencies even when I don't want to use ray jobs API? When using ray.init(ray://my-grpc-ingress-hostname) I get ray.exceptions.RayTaskError(ModuleNotFoundError) on the first import. when using JobSubmissionClient it works fine.
I have some strong opinions about this as someone who loves both uv and ray separately.
ray should not change its behavior based on whether a file is run with python or uv run or some other way. This is co-opting a uv command to change our behavior, and why do that? It opens up a lot of potential confusion about where any issues are coming from.
Breaking uv user expectations
As a uv user, I expect a very high level of similarity between the behavior of uv run your_file.py and uv sync && source .venv/bin/activate && python file.py
Breaking ray user expectations
As a ray user, I expect to not have a runtime_env in ray.init() unless I specified it and configured it in my code or at the CLI.
An alternative
ray could have it's own command that implements the behavior we're discussing here. I don't know, something like ray run --distribute-venv (first thing that came to mind... could be much better).
I love ray and spend so much time working with rllib. I want to see it improved, but changing behavior based on whether or not uv run was used to start my program is just confusing and messy, in my opinion.
As of ray 2.51.1 - are we still required to install uv ourselves on the ray cluster?
Hello! The newly introduced uv integration is really cool and our codebase is heavily relying on it. But it seems it lacks of some error handling of corner cases: #59342 . Would you guys like to take a look?