torchrunx
torchrunx copied to clipboard
Support debugger
https://github.com/microsoft/debugpy
https://code.visualstudio.com/docs/python/debugging#_remote-script-debugging-with-ssh
Basically the way this would work:
- Selected worker (which the debugger will connect to) will start the debugging server (and wait for the connection), right before executing
worker_args.function(). All workers should also set abarrier().
import debugpy
local_ip: str
random_port: int
debugpy.listen((local_ip, random_port))
debugpy.wait_for_client()
- Launcher should start a TCP tunnel (via SSH forwarding) to that port. Launcher should print the local mapped port.
- User can attach Python debugger in vscode to that local port.