Cannot set breakpoints in UDFs using PyCharm with the embedded server
Description
It is possible to use PyCharm's debugger to debug DH code running on the embedded server directly from PyCharm. It is expected that the debugger would respect any breakpoints in user-defined functions that get called in query strings. However, this is not always the case. The debugger does not respect these breakpoints in the simple cases of calling the UDF with update_view, or with calling the UDF on a ticking table.
Steps to reproduce
-
Make a brand-new, empty directory. I will call mine
test-debug. -
Within that directory, create a new Python virtual environment. This will hold the installation of the embedded server and pytest. It might be the case that this bug is version-dependent, so I am going to create one venv for DH 0.36.1, and one venv for 0.35.2. In each venv, install the corresponding version of the embedded server.
# create venv for 0.36.1
alexpeters@Alexs-MBP-2 test-debug % python3 -m venv 0.36.1-venv
alexpeters@Alexs-MBP-2 test-debug % source 0.36.1-venv/bin/activate
(0.36.1-venv) alexpeters@Alexs-MBP-2 test-debug % pip3 install deephaven-server==0.36.1
(0.36.1-venv) alexpeters@Alexs-MBP-2 test-debug % pip3 install pytest
(0.36.1-venv) alexpeters@Alexs-MBP-2 test-debug % deactivate
# create venv for 0.35.2
alexpeters@Alexs-MBP-2 test-debug % python3 -m venv 0.35.2-venv
alexpeters@Alexs-MBP-2 test-debug % source 0.35.2-venv/bin/activate
(0.35.2-venv) alexpeters@Alexs-MBP-2 test-debug % pip3 install deephaven-server==0.35.2
(0.35.2-venv) alexpeters@Alexs-MBP-2 test-debug % pip3 install pytest
(0.35.2-venv) alexpeters@Alexs-MBP-2 test-debug % deactivate
-
Create a new PyCharm project from the
test-debugdirectory using theOpencommand on the PyCharm launch screen. -
Add a new interpreter to your PyCharm project by clicking the interpreter in the bottom-right corner, selecting
Add new interpreter>Add local interpreter>Existing, and navigate to the venv we just created. Do this for both venvs. -
Add the following 3 Python files to the root directory of the project:
First, this is a simplified reproducer, and might be a more straight-line path to finding the problem. simplified.py:
# unconventional import to verify package version
import deephaven_server as dh
print(dh.__version__)
s = dh.server.Server(port=10000, jvm_args=[
"-Xmx16g",
"-DAuthHandlers=io.deephaven.auth.AnonymousAuthenticationHandler",
"-Dprocess.info.system-info.enabled=false"]
)
s.start()
from deephaven import empty_table, time_table
static_t = empty_table(10).update("X = ii")
ticking_t = time_table("PT1s").update("X = ii")
def udf(x):
"""User-defined function, set a breakpoint on the next line."""
a = x+1
return a
# uncomment the case you want to evaluate
test_1 = static_t.update("Y = udf(X)")
#test_2 = ticking_t.update("Y = udf(X)")
#test_3 = static_t.update_view("Y = udf(X)")
#test_4 = ticking_t.update_view("Y = udf(X)")
Next, this is a pytest configuration file required by the actual use case in which the problem presented itself. conftest.py:
import os
import pytest
from _pytest.main import Session
# unconventional import to verify package version
import deephaven_server as dh
def start_server(port: int = 10000) -> dh.server.Server:
print(dh.__version__)
extra_classpath: list[str] = []
if (path := os.environ.get("EXTRA_CLASSPATH")) is not None:
extra_classpath.append(path)
server = dh.server.Server(
port=port,
jvm_args=[
"-Xmx4g",
"-Dprocess.info.system-info.enabled=false",
"-DAuthHandlers=io.deephaven.auth.AnonymousAuthenticationHandler",
],
extra_classpath=extra_classpath,
)
server.start() # type: ignore
return server
def pytest_sessionstart(session: Session) -> None:
start_server(port=0)
This is a simple repro of the actual use case using pytest. usecase.py
import pytest
def udf(x):
"""User-defined function, set a breakpoint on the next line."""
a = x+1
return a
def test_static_update():
"""Static table, UDF in a call to `update`, invoked via pytest."""
from deephaven import empty_table
t = empty_table(10).update("X = ii")
t2 = t.update("Y = udf(X)")
def test_ticking_update():
"""Ticking table, UDF in a call to `update`, invoked via pytest."""
from deephaven import time_table
t = time_table("PT1s").update("X = ii")
t2 = t.update("Y = udf(X)")
def test_static_update_view():
"""Static table, UDF in a call to `update_view`, invoked via pytest."""
from deephaven import empty_table
t = empty_table(10).update("X = ii")
t2 = t.update_view("Y = udf(X)")
def test_ticking_update_view():
"""Ticking table, UDF in a call to `update_view`, invoked via pytest."""
from deephaven import time_table
t = time_table("PT1s").update("X = ii")
t2 = t.update_view("Y = udf(X)")
-
Let's start with the simplified reproducer. Ensure you have a Python interpreter selected for the version that you want to test. Next, set a breakpoint on line 19, inside of
udf. Right-click on the filenamesimplified.pytowards the top of the screen and clickDebug 'simplified'. This will start the debugger, and the breakpoint that you set will be hit. This is good. -
Exit the debugger, comment out line 23, and uncomment line 24. Then, engage the debugger as before. This will test the same udf with a ticking table. Unfortunately, the breakpoint will not be hit, and the debugger will roll through as though there was no breakpoint present. This is bad. The same is true for lines 25 and 26.
-
usecase.pyis a simple repro of the actual problem that the user reported using pytest. I created both versions of this repro to see if pytest is really relevant here, and it appears that it is not. Set a breakpoint on line 5 ofusecase.py, right-click the filename, and clickDebug 'Python tests in usecase.py'. The debugger will start, and you'll hit the breakpoint at the first test case. However, after you step through this case, the other 3 test cases will not respect the breakpoint.
Expected results
Every single test should respect the breakpoints set in the UDFs.
Actual results
The breakpoints are not respected with a ticking table, or when using update_view.
Additional details and attachments
The user claimed that this all worked in 0.35.2, but I have not been able to verify that. I see exactly the same behavior on 0.35.2.
Here are the relevant versions:
pydevd: 232.9921.89
PyCharm: 23.2.2, build 232.9921.89
Python: 3.12
When I looked at this with a much older version of DH, it required pydevd_pycharm. Has this been tried?
See: https://github.com/chipkent/test-dh-docker-pycharm The repo above has links to further documentation.
Here is the similar test repo for vs code. https://github.com/chipkent/test-dh-docker-vscode
One of the two was easier to make work than the other. I think it was pycharm.
In the reproducer above, I don't even see that pydevd has been installed or configured.
See: https://deephaven.io/core/docs/how-to-guides/debugging/pycharm-debugging/
From these docs, it looks like pydevd may be an alternative to pydevd_pycharm, but that should be verified.
From my prior work on this, if these packages are not installed and set up right, there are certain circumstances when the debugger will not engage.
When we did the original PR, I think the lack of breakpoints in threads was related to threads created by Java not getting the debugger set up correctly. I think it is this file in the original PR.
https://github.com/deephaven/deephaven-core/pull/3075/files#diff-a71d6158806eeae368f25c9cc07d17b700047318ff45d3a8b0737c81113bac44
I would add some print code to make sure that (1) a debugger exists (pydevd or pydevd_pycharm) and that settrace is eventually called on all of the threads.
My understanding of the way that our user is setting things up is that pydevd is completely bypassed. Trying to go that route led me on a goose chase that was not fun. If we want to recommend the pydevd approach, that is a different question.
Based on java_threads.py, I don't have any expectation that debugging will work on Java spawned threads without pydevd or pydevd_pycharm.
To be clear, the steps to reproduce specifically call out running the system with PyCharm, so pydevd_pycharm is provided automatically by the debugger right?
We've historically found that sometimes pydevd works better than pydevd_pycharm even from PyCharm, it is just a matter of what is less buggy for a given release.
Do we have another debugger that we want to work outside the pydevd family?
The user is having this problem with:
pydevd: 241.18968.29
PyCharm: 2024.1.5
Python: 3.10.12
I have done some extensive testing and have not yet been able to get what I would consider to be a failure.
I created a venv using a local script I have:
deephaven: 0.37.0+snapshot
pycharm: 2023.3.3 pro
pydev: 233.13763.11
python: 3.12
Here are some examples that work as I expect:
from deephaven_server import Server
server = Server(
port=10000,
jvm_args=[
"-Xmx4g",
"-DAuthHandlers=io.deephaven.auth.AnonymousAuthenticationHandler",
],
)
server.start()
import deephaven
from deephaven import time_table
print(deephaven.__version__)
def func() -> int:
print("in func")
return 1
tt = time_table("PT1s")
tt2 = tt.update("X = func()")
from time import sleep
sleep(15)
print("DONE")
from deephaven_server import Server
server = Server(
port=10000,
jvm_args=[
"-Xmx4g",
"-DAuthHandlers=io.deephaven.auth.AnonymousAuthenticationHandler",
],
)
server.start()
from deephaven import time_table
def func() -> int:
print("in func")
return 1
tt = time_table("PT1s").update_view("X = func()")
from time import sleep
sleep(15)
print("DONE")
from deephaven_server import Server
server = Server(
port=10000,
jvm_args=[
"-Xmx4g",
"-DAuthHandlers=io.deephaven.auth.AnonymousAuthenticationHandler",
],
)
server.start()
from deephaven import time_table
def func() -> int:
print("in func")
return 1
tt = time_table("PT1s").update_view("X = func()").select()
from time import sleep
sleep(15)
print("DONE")
from deephaven_server import Server
server = Server(
port=10000,
jvm_args=[
"-Xmx4g",
"-DAuthHandlers=io.deephaven.auth.AnonymousAuthenticationHandler",
],
)
server.start()
from deephaven import time_table
def func() -> int:
print("in func")
return 1
tt = time_table("PT1s").update_view("X = func()")
from deephaven.pandas import to_pandas
df = to_pandas(tt)
# from time import sleep
# sleep(5)
# df2 = to_pandas(tt)
#
# sleep(15)
print("DONE")
I then took the example above and modified it. With my modifications, the case behaves as expected. See the comments added to the example.
# unconventional import to verify package version
import deephaven_server as dh
print(dh.__version__)
s = dh.server.Server(port=10000, jvm_args=[
"-Xmx16g",
"-DAuthHandlers=io.deephaven.auth.AnonymousAuthenticationHandler",
"-Dprocess.info.system-info.enabled=false"]
)
s.start()
from deephaven import empty_table, time_table
static_t = empty_table(10).update("X = ii")
ticking_t = time_table("PT1s").update("X = ii")
def udf(x):
"""User-defined function, set a breakpoint on the next line."""
a = x+1
return a
# uncomment the case you want to evaluate
# test_1 = static_t.update("Y = udf(X)") # works fine
# test_2 = ticking_t.update("Y = udf(X)") # shutdown race -- see below
# test_3 = static_t.update_view("Y = udf(X)") # should not break in the UDF because the cells are never evaled
# test_4 = ticking_t.update_view("Y = udf(X)") # should not break in the UDF because the cells are never evaled
# adding the sleep avoids a shutdown race so test_2 hits the UDF breakpoint
from time import sleep
sleep(15)
... so based on my analysis, I think that the unit test has a few problems.
- The cases using
update_viewnever evaluate the cells, so they never call the UDF. Adding aselectat the end will trigger the eval. - The ticking tables with
updatehave a race condition where the program may exit before the eval is called. To handle these calls in thedeephaventest suite,wait_ticking_table_updateis used. See https://github.com/deephaven/deephaven-core/blob/97780f42716a2576a8a181259bd3b945a36aff57/py/server/tests/testbase.py#L42.
I repeated the experiment with Deephaven 0.36.1, and it is working fine for me.
It seems to work on both Java 11 and Java 17 for me. Both are OpenJDK.
@alexpeters1208 found that some PyCharm versions before 2024 did not work. PyCharm >= 2024 worked as did the latest VS Code.