sentry-python Unable to send to sentry unhandled exceptions from pyspark jobs.

Hi, I followed this documentation (https://docs.sentry.io/platforms/python/guides/pyspark/) and I was not able to send to sentry unhandled exceptions.

I was only able to do it using the capture_event and capture_exception methods, but for that, I have to capture any exception first.

Also, the implementation of the daemon as described in the documentation seems not to have any effect.

Python code below works as expected: Exception Sentry Poc Test is unhandled and it is sent to Sentry executing it from a windows local environment.

if __name__ == __main__:
    
    sentry_sdk.init("SENTRY_DSN",
                    integrations=[SparkIntegration()])

    raise Exception("Sentry Poc Test")

For Pyspark, the only way I found is handling the exception. This is deployed to the cluster using dbx and sentry-sdk dependencies are correctly deployed:

try:
    my_function(*args, **kwargs)
except Exception as err:
    message = f"An exception occurred"
    with sentry_sdk.push_scope() as scope:
        exc_info = exc_info_from_error(err)
        exceptions = exceptions_from_error_tuple(exc_info)
        scope.set_extra("exception", exceptions)

        sentry_sdk.capture_event({
            "message": message,
            "level": "error",
            "exception": {
                "values": exceptions
            },
        })
    raise

Am I missing something?, could you please advise me on how I should implement this to be able to send to sentry unhandled exceptions from pyspark? Thanks

Oct 27 '21 13:10 fernanvarelamews

Hey thanks for writing in. What version of spark are you using?

The SDK should be catching unhandled exceptions. Could you pass the debug=True kward to Sentry.init and share the sdk's debug logs?

Oct 28 '21 15:10 AbhiPrasad

Hi! Thanks for your answer. I made tests with clusters configured with Databricks runtime versions 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12) and 9.0 (includes Apache Spark 3.1.2, Scala 2.12).

And this is the result of setting up Debug=True.

[sentry] DEBUG: Setting up integrations (with default = True) [sentry] DEBUG: Setting up previously not enabled integration spark [sentry] DEBUG: Setting up previously not enabled integration logging [sentry] DEBUG: Setting up previously not enabled integration stdlib [sentry] DEBUG: Setting up previously not enabled integration excepthook [sentry] DEBUG: Setting up previously not enabled integration dedupe [sentry] DEBUG: Setting up previously not enabled integration atexit [sentry] DEBUG: Setting up previously not enabled integration modules [sentry] DEBUG: Setting up previously not enabled integration argv [sentry] DEBUG: Setting up previously not enabled integration threading [sentry] DEBUG: Enabling integration spark [sentry] DEBUG: Enabling integration logging [sentry] DEBUG: Enabling integration stdlib [sentry] DEBUG: Enabling integration excepthook [sentry] DEBUG: Enabling integration dedupe [sentry] DEBUG: Enabling integration atexit [sentry] DEBUG: Enabling integration modules [sentry] DEBUG: Enabling integration argv [sentry] DEBUG: Enabling integration threading

Nov 02 '21 12:11 fernanvarelamews

Yeah I think the issue here is with Spark 3 (as our test matrix runs against Spark 2). We'll have to take some time to investigate and see what changes we need to make to fix this.

Contributions are welcome though if anyone wants to help out!

Nov 02 '21 13:11 AbhiPrasad

Hello Sentry, I'm Wondering if you have any update about this request? Thanks!

May 19 '22 07:05 gkolmerer-mews

+1 Also having this issue.

Jan 04 '23 12:01 DaanRademaker

Is there any progress? Any direction where to look if I would like to contribute?

Dec 19 '23 14:12 DaanRademaker

We no longer actively test pyspark against our CI matrix, the integration is a little unmaintained. We might even think about dropping support because we don't have the resources to actively help triage issues.

For now, I would recommend taking a look at the tests and seeing what failures it encounters for latest pyspark version https://github.com/getsentry/sentry-python/blob/master/tests/integrations/spark/test_spark.py

there might better approaches for using the daemon than what I came up with 3+ years ago.

Dec 19 '23 15:12 AbhiPrasad

sentry-python sentry-python copied to clipboard

Unable to send to sentry unhandled exceptions from pyspark jobs.

sentry-python
sentry-python copied to clipboard