sentry-python
sentry-python copied to clipboard
Unable to send to sentry unhandled exceptions from pyspark jobs.
Hi, I followed this documentation (https://docs.sentry.io/platforms/python/guides/pyspark/) and I was not able to send to sentry unhandled exceptions.
I was only able to do it using the capture_event and capture_exception methods, but for that, I have to capture any exception first.
Also, the implementation of the daemon as described in the documentation seems not to have any effect.
Python code below works as expected: Exception Sentry Poc Test is unhandled and it is sent to Sentry executing it from a windows local environment.
if __name__ == __main__:
sentry_sdk.init("SENTRY_DSN",
integrations=[SparkIntegration()])
raise Exception("Sentry Poc Test")
For Pyspark, the only way I found is handling the exception. This is deployed to the cluster using dbx and sentry-sdk dependencies are correctly deployed:
try:
my_function(*args, **kwargs)
except Exception as err:
message = f"An exception occurred"
with sentry_sdk.push_scope() as scope:
exc_info = exc_info_from_error(err)
exceptions = exceptions_from_error_tuple(exc_info)
scope.set_extra("exception", exceptions)
sentry_sdk.capture_event({
"message": message,
"level": "error",
"exception": {
"values": exceptions
},
})
raise
Am I missing something?, could you please advise me on how I should implement this to be able to send to sentry unhandled exceptions from pyspark? Thanks
Hey thanks for writing in. What version of spark are you using?
The SDK should be catching unhandled exceptions. Could you pass the debug=True
kward to Sentry.init
and share the sdk's debug logs?
Hi! Thanks for your answer. I made tests with clusters configured with Databricks runtime versions 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12) and 9.0 (includes Apache Spark 3.1.2, Scala 2.12).
And this is the result of setting up Debug=True
.
[sentry] DEBUG: Setting up integrations (with default = True) [sentry] DEBUG: Setting up previously not enabled integration spark [sentry] DEBUG: Setting up previously not enabled integration logging [sentry] DEBUG: Setting up previously not enabled integration stdlib [sentry] DEBUG: Setting up previously not enabled integration excepthook [sentry] DEBUG: Setting up previously not enabled integration dedupe [sentry] DEBUG: Setting up previously not enabled integration atexit [sentry] DEBUG: Setting up previously not enabled integration modules [sentry] DEBUG: Setting up previously not enabled integration argv [sentry] DEBUG: Setting up previously not enabled integration threading [sentry] DEBUG: Enabling integration spark [sentry] DEBUG: Enabling integration logging [sentry] DEBUG: Enabling integration stdlib [sentry] DEBUG: Enabling integration excepthook [sentry] DEBUG: Enabling integration dedupe [sentry] DEBUG: Enabling integration atexit [sentry] DEBUG: Enabling integration modules [sentry] DEBUG: Enabling integration argv [sentry] DEBUG: Enabling integration threading
Yeah I think the issue here is with Spark 3 (as our test matrix runs against Spark 2). We'll have to take some time to investigate and see what changes we need to make to fix this.
Contributions are welcome though if anyone wants to help out!
Hello Sentry, I'm Wondering if you have any update about this request? Thanks!
+1 Also having this issue.
Is there any progress? Any direction where to look if I would like to contribute?
We no longer actively test pyspark against our CI matrix, the integration is a little unmaintained. We might even think about dropping support because we don't have the resources to actively help triage issues.
For now, I would recommend taking a look at the tests and seeing what failures it encounters for latest pyspark version https://github.com/getsentry/sentry-python/blob/master/tests/integrations/spark/test_spark.py
there might better approaches for using the daemon than what I came up with 3+ years ago.