[Ray debugger] Unable to use debugger on Ray Cluster on k8s
What happened + What you expected to happen
I tried to use the debugger plug-in on VScode according to the guidance(https://www.anyscale.com/blog/ray-distributed-debugger), but when I click on a paused task to attach the VSCode debugger, I always get an error connect ECONNREFUSED $ip:port.
I tried to enable the plug-in locally and it worked normally.
I also tried to add the ray-debugger-external flag and tested that the Ray Cluster on k8s can enable the native debugger normally.
I don’t know how to use the debugger plug-in on VScode on the Ray Cluster on k8s. Can you provide relevant guidance or help?
Versions / Dependencies
Ray 2.23.0 Python 3.10.12
Reproduction script
Sample code in guidance
Issue Severity
High: It blocks me from completing my task.
Or do I need to configure launch.json in vscode?
I think the problem is that Ray debugger uses a random port, so it's not possible to know ahead which port to open when running on Kubernetes
From https://github.com/ray-project/ray/blob/master/python/ray/util/debugpy.py:
def _ensure_debugger_port_open_thread_safe():
(...)
(host, port) = debugpy.listen(
(ray._private.worker.global_worker.node_ip_address, 0)
)
And from definition of listen() in https://github.com/microsoft/debugpy/blob/main/src/debugpy/public_api.
This may be different from address if port was 0 in the latter, in which case the adapter will pick some unused ephemeral port to listen on.
In our case we're running ephemeral Ray clusters using RayJob resource definition from KubeRay, so we could specify a single port. In case of static Ray clusters, could a port range be a solution?
@brycehuang30 does the new distributed debugger have this capability? if we don't I say we build forward and add this as a feature request to that.
Distributed debugger currently cannot custom the debugging ports. I think we could solve this in two steps:
- let debugger only use a fixed range of port numbers, e.g. 50000 - 51000. This allows users to set the port open in k8s
- enable custom port range setting, so users could choose the port range
Thanks for looking into this. One more thing that needs to be considered when debugging a job running in Kubernetes, is the IP address
From Ray job log:
2024-10-03 11:07:18,021 INFO debugpy.py:66 -- Ray debugger is listening on 100.104.4.3:34983
2024-10-03 11:07:18,023 INFO debugpy.py:87 -- Waiting for debugger to attach...
That IP address 100.104.4.3 is internal to the Kubernetes cluster, so when trying to debug from VS Code, getting error
(In this case 127.0.0.1:8265 is being port-forwarded from Ray dashboard running in Kubernetes)
Possibly VS Code debugger plugin should connect to the external IP address of head node, and not the internal node IP address?
I've run into an issue that seems very similar to this one. In fact, it might very well be the same issue.
I'm using Ray 2.30 and I get a connection refused error when I try to connect vs code to the paused task. I noticed that debugpy on the task actually crashes soon after debugpy.listen(...) is called, so by the time I'm trying to connect vs code, nothing is listening on the configured port anymore (the port as printed in the Ray debugger is listening on <ip>:<port> log msg)
- I also tried ray 2.39: same issue
- I tried patching ray to make debugpy run on a fixed port, and/or localhost/0.0.0.0 ip (combined with kubectl port forwarding): all to no avail. In all cases, the root issue seems that nothing is listening anymore on the port where debugpy is supposed to listen.
- I tried running debugpy.listen on a k8s pod without ray, and in that case it works fine. Using
lsofI can see that something is listening on the configured listen port. - The underlying debugpy crash is hard to detect, apart from that nothing is listening on the port... However, if you enable extra logging you can see it crash (
BrokenPipeError) in the logs though. I reported this issue in debugpy here (with details on how to find the crash msg indebugpy.pydevd.NNNN.log): https://github.com/microsoft/debugpy/issues/1749
Distributed debugger currently cannot custom the debugging ports. I think we could solve this in two steps:
- let debugger only use a fixed range of port numbers, e.g. 50000 - 51000. This allows users to set the port open in k8s
- enable custom port range setting, so users could choose the port range
We are facing the same issue with our local Ray Cluster but in our case behind docker-compose for local development/testing.
We were wondering if the suggested solution like an optional parameter to specify debugpy ports is still an option or if there is any other recommendation to overcome the issue.
We ended up deploying https://docs.linuxserver.io/images/docker-code-server/ inside the Kubernetes cluster, which can then access the necessary ports
@rasmus-unity Thank you for sharing. Can you explain the specific operation process?
At the same time, I noticed that there is a relevant pr (https://github.com/ray-project/ray/pull/49116). Can I assume that this requirement can be met by referring to this document? cc @brycehuang30
@rasmus-unity and @Moonquakes, thank you for your insights!
To test this, we created a Dockerfile based on Ray images and installed the SSH server as mentioned in #49116, along with other necessary components. We were seeking a solution for agile local development and debugging, so we also ended up having to mount the source code being developed as volumes on the Ray head node and installed various tools such as Devbox that we require for developing. This setup allowed us to develop directly on the Ray head and utilize the Ray Distributed Debugger extension, but we believe this is so much complexity added aside from installing unnecessary stuff on the ray-head that could potentially be overcome.
While this approach was useful and for the moment makes the trick for us, we still believe that having an out-of-the-box solution without the need to install SSH servers and other possible dependencies would be extremely valuable having already the amazing provided Ray Distributed Debugger extension. Implementing a way to configure a range of ports for debugpy to listen on, as previously suggested by @brycehuang30, would greatly enhance the development experience in our opinion.
Hi @rogerfydp , Could you explain your operation steps and dockerfile in more detail? I installed ssh according to the instructions in https://github.com/ray-project/ray/pull/49116, and opened port 22. It seems that there will be other problems. Kuberay will open some ports by default when the port is not filled in, but it will not be added if 22 is added manually (https://github.com/ray-project/kuberay/blob/v1.2.2/ray-operator/controllers/ray/common/service.go#L409-L417).
Hi @pcmoritz , will we consider supporting the use of distributed debuggers in kuberay environments without installing ssh? Installing ssh is still a heavy dependency and may cause certain security risks. A better mechanism may be to expose fixed ports to the outside and then just listen on these ports.
It seems that there is an external plugin that can already achieve this function, FYI: https://ray.slack.com/archives/C01DLHZHRBJ/p1722501457132069, https://github.com/zen-xu/plan-d
The use of SSH in https://github.com/ray-project/ray/pull/49116 is just an example and you don't actually need to use SSH -- you can just run the vscode server inside the cluster in any way you like (maybe as a side car, or directly in the ray container, or maybe even as a separate deployment). The above suggested https://docs.linuxserver.io/images/docker-code-server/ could be an option or probably you can also just run the upstream https://code.visualstudio.com/docs/remote/vscode-server. In those cases you just need to expose the VS Code frontend to the users.
If somebody has such a setup and wants to contribute a PR, that would be most welcome! You could e.g. change the KubeRay tab in https://github.com/ray-project/ray/pull/49116 to KubeRay (SSH) and add another tab with KubeRay (VS Code Server) with your instructions :)
@pcmoritz Oh I know, I meant that I want users to be able to use the local vscode, not the vscode/vscode frontend that we deploy on the server side. All the user needs to do is port-forward some interface and let the local vscode plugin listen on it.