ipykernel can start debugpy on an occupied port
I am troubleshooting some flaky tests for our jupyter notebook debugger in vscode. I saw an instance where we tried to start the debugger and never got a response. I see this message in the log:
2022-08-31T09:09:49.4948164Z info 9:9:49.487: Debug IO Pub message: {"header":{"msg_id":"7c8c5c89-6f267f41f37c03b50f0f97d9_5315_29","msg_type":"error","username":"runner","session":"7c8c5c89-6f267f41f37c03b50f0f97d9","date":"2022-08-31T09:09:49.481276Z","version":"5.3"},"msg_id":"7c8c5c89-6f267f41f37c03b50f0f97d9_5315_29","msg_type":"error","parent_header":{"msg_id":"7c8c5c89-6f267f41f37c03b50f0f97d9_5315_27","msg_type":"execute_request","username":"runner","session":"7c8c5c89-6f267f41f37c03b50f0f97d9","date":"2022-08-31T09:09:48.672109Z","version":"5.3"},"metadata":{},"content":{"traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mRuntimeError\u001b[0m Traceback (most recent call last)","Input \u001b[0;32mIn [1]\u001b[0m, in \u001b[0;36m<cell line: 1>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mdebugpy\u001b[39;00m;\u001b[43mdebugpy\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mlisten\u001b[49m\u001b[43m(\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m127.0.0.1\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[38;5;241;43m42323\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n","File \u001b[0;32m/opt/hostedtoolcache/Python/3.10.6/x64/lib/python3.10/site-packages/debugpy/public_api.py:31\u001b[0m, in \u001b[0;36m_api.<locals>.apply.<locals>.wrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 28\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mdebugpy\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mserver\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m api\n\u001b[1;32m 30\u001b[0m wrapped \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mgetattr\u001b[39m(api, f\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__name__\u001b[39m)\n\u001b[0;32m---> 31\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mwrapped\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n","File \u001b[0;32m/opt/hostedtoolcache/Python/3.10.6/x64/lib/python3.10/site-packages/debugpy/server/api.py:140\u001b[0m, in \u001b[0;36m_starts_debugging.<locals>.debug\u001b[0;34m(address, **kwargs)\u001b[0m\n\u001b[1;32m 138\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m func(address, settrace_kwargs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\n\u001b[1;32m 139\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m:\n\u001b[0;32m--> 140\u001b[0m \u001b[43mlog\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mreraise_exception\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;132;43;01m{0}\u001b[39;49;00m\u001b[38;5;124;43m() failed:\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mfunc\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[38;5;18;43m__name__\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mlevel\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43minfo\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n","File \u001b[0;32m/opt/hostedtoolcache/Python/3.10.6/x64/lib/python3.10/site-packages/debugpy/server/api.py:138\u001b[0m, in \u001b[0;36m_starts_debugging.<locals>.debug\u001b[0;34m(address, **kwargs)\u001b[0m\n\u001b[1;32m 135\u001b[0m settrace_kwargs[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mdont_trace_end_patterns\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m (\u001b[38;5;28mstr\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mdebugpy_launcher.py\u001b[39m\u001b[38;5;124m\"\u001b[39m),)\n\u001b[1;32m 137\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 138\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[43maddress\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43msettrace_kwargs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 139\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m:\n\u001b[1;32m 140\u001b[0m log\u001b[38;5;241m.\u001b[39mreraise_exception(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{0}\u001b[39;00m\u001b[38;5;124m() failed:\u001b[39m\u001b[38;5;124m\"\u001b[39m, func\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__name__\u001b[39m, level\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124minfo\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n","File \u001b[0;32m/opt/hostedtoolcache/Python/3.10.6/x64/lib/python3.10/site-packages/debugpy/server/api.py:240\u001b[0m, in \u001b[0;36mlisten\u001b[0;34m(address, settrace_kwargs)\u001b[0m\n\u001b[1;32m 237\u001b[0m log\u001b[38;5;241m.\u001b[39minfo(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mEndpoints received from adapter: \u001b[39m\u001b[38;5;132;01m{0}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m, json\u001b[38;5;241m.\u001b[39mrepr(endpoints))\n\u001b[1;32m 239\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124merror\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;129;01min\u001b[39;00m endpoints:\n\u001b[0;32m--> 240\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mRuntimeError\u001b[39;00m(\u001b[38;5;28mstr\u001b[39m(endpoints[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124merror\u001b[39m\u001b[38;5;124m\"\u001b[39m]))\n\u001b[1;32m 242\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 243\u001b[0m server_host \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mstr\u001b[39m(endpoints[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mserver\u001b[39m\u001b[38;5;124m\"\u001b[39m][\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhost\u001b[39m\u001b[38;5;124m\"\u001b[39m])\n","\u001b[0;31mRuntimeError\u001b[0m: Can't listen for client connections: [Errno 98] Address already in use"],"ename":"RuntimeError","evalue":"Can't listen for client connections: [Errno 98] Address already in use"},"buffers":[],"channel":"iopub"}
2022-08-31T09:09:49.4981656Z info 9:9:49.489: [32mWidget Coordinator received IPyWidgets_msg_received
If I clean the error message up a bit...
Traceback (most recent call last)
Input In [1], in <cell line: 1>()
----> 1 import debugpy;debugpy.listen((\"127.0.0.1\",42323))
","File /opt/hostedtoolcache/Python/3.10.6/x64/lib/python3.10/site-packages/debugpy/public_api.py:31, in _api.<locals>.apply.<locals>.wrapper(*args, **kwargs)
28 from debugpy.server import api
30 wrapped = getattr(api, f.__name__)
---> 31 return wrapped(*args, **kwargs)
","File /opt/hostedtoolcache/Python/3.10.6/x64/lib/python3.10/site-packages/debugpy/server/api.py:140, in _starts_debugging.<locals>.debug(address, **kwargs)
138 return func(address, settrace_kwargs, **kwargs)
139 except Exception:
--> 140 log.reraise_exception(\"{0}() failed:\", func.__name__, level=\"info\")
","File /opt/hostedtoolcache/Python/3.10.6/x64/lib/python3.10/site-packages/debugpy/server/api.py:138, in _starts_debugging.<locals>.debug(address, **kwargs)
135 settrace_kwargs[\"dont_trace_end_patterns\"] = (str(\"debugpy_launcher.py\"),)
137 try:
--> 138 return func(address, settrace_kwargs, **kwargs)
139 except Exception:
140 log.reraise_exception(\"{0}() failed:\", func.__name__, level=\"info\")
","File /opt/hostedtoolcache/Python/3.10.6/x64/lib/python3.10/site-packages/debugpy/server/api.py:240, in listen(address, settrace_kwargs)
237 log.info(\"Endpoints received from adapter: {0}\", json.repr(endpoints))
239 if \"error\" in endpoints:
--> 240 raise RuntimeError(str(endpoints[\"error\"]))
242 try:
243 server_host = str(endpoints[\"server\"][\"host\"])
","RuntimeError: Can't listen for client connections: [Errno 98] Address already in use
I see that ipykernel picks a free port here, and sends it to debugpy https://github.com/ipython/ipykernel/blob/cce9b7d736d2cdd32e8695436dbed77bc7912080/ipykernel/debugger.py#L231-L239
and in the error message, debugpy is saying that the port is not free. I know it's not really possible to do this kind of thing in an atomic way. Is it possible for ipykernel to detect this error and retry with a new port? Should debugpy have a mode where it can pick a free port and return it? Or should a client like vscode detect this error and retry?
Our test run: https://github.com/microsoft/vscode-jupyter/runs/8109923123?check_suite_focus=true
Not sure if it helps, but here is a link to how jupyter-server finds a port with retries; it not only checks for address being already in use but also for permission errors.