sentry-python icon indicating copy to clipboard operation
sentry-python copied to clipboard

Sentry has ~8% overhead in Django test suites

Open danpalmer opened this issue 4 years ago • 11 comments

Apologies for a somewhat vague report, I'm happy to expand this once it's decided what the appropriate course of action is.

We've found that disabling Sentry (not calling init) in tests saves around 8% of test time on a large Django codebase. This result has been replicated by an engineer at another company with a different Django codebase.

It could be that this is just the expected overhead, in which case I think documenting this would be great. Some advice or comment in that documentation around whether this is worth it would be great as well – is it worth 8% to ensure that Sentry doesn't interact badly with the rest of the codebase, or is Sentry reliable and isolated enough that it's unlikely to catch any issues and the 8% time saving is more important.

I wouldn't be surprised if the overhead is not expected in typical production use, and that tests are a weird case (they throw a lot of handled exceptions for example).

Alternatively, it could be that this overhead is not expected and that this is a performance issue that Sentry would like to address. If so I'm happy to provide data from our test suite if you can point me towards what would be useful for you.

danpalmer avatar Apr 14 '20 09:04 danpalmer

I'm "the other engineer at another company". 😄

adamchainz avatar Apr 14 '20 09:04 adamchainz

Definetly want to investigate this but I can't yet promise a fix because I don't know how pathological the slow codepath you're running into is.

I think a good start would be to figure out which test slows down the most so we can go from observed behavior of an entire testsuite to some sort of microbenchmark. Pytest has a --durations option that may help with that, I imagine other frameworks have something similar.

untitaker avatar Apr 14 '20 09:04 untitaker

I ran py-spy on the test suite and checked with speedscope, it seems record_sql_queries is most of the source of the overhead.

Screenshot 2020-04-14 at 11 32 41

(This screenshot doesn't show much information but I couldn't find any other to post without confidential info)

adamchainz avatar Apr 14 '20 10:04 adamchainz

record_sql_queries is a context manager that we wrap db statements with, so I am curious if this may include the entire db query duration?

untitaker avatar Apr 14 '20 11:04 untitaker

Sorry it seems that this already shows the entire flamegraph for record_sql_queries... still not sure how to interpret this graph. It seems a lot of time is spent in uuid4()?

untitaker avatar Apr 14 '20 13:04 untitaker

(This screenshot doesn't show much information but I couldn't find any other to post without confidential info)

Yes that's the flame graph for it. I can't show more sorry.

I guess you can maybe eliminate query recording if you won't be sending the results anywhere?

adamchainz avatar Apr 14 '20 14:04 adamchainz

Ok. Can you comment out the install_sql_hook() call here and see if the overall run time of the testsuite drops significantly? Just wondering if it's just the sql instrumentation or something else as well.

untitaker avatar Apr 14 '20 14:04 untitaker

I'm afraid I don't have time to try this. I hope you take your own profiles from a representative application.

adamchainz avatar Apr 26 '20 16:04 adamchainz

@untitaker Commeting install_sql_hook() drops the run time significantly.

I have tried this on a django test suite with ~500 tests. Results of 4 test runs average.

Without sentry - 145s 
With sentry - 155s
With sentry and disabling sql hook - 147s

Also, is there a way to estimate the approximate overhead of sentry profiling in production projects?

ChillarAnand avatar Oct 13 '20 14:10 ChillarAnand

Thanks @ChillarAnand that confirms the suspicion. I hope we can free resources internally to work on this.

Also, is there a way to estimate the approximate overhead of sentry profiling in production projects?

this generally depends on the kind of web framework and what kind of extensions you are using -- 8% is definetly on the upper end, also because we hook into a lot of Django. Integrations for AIOHTTP, Sanic and Flask capture much less data and as such the overhead will be lower. However, if you install a lot of Flask extensions you may get close to the same overhead as Django.

untitaker avatar Oct 19 '20 09:10 untitaker

Updating this old thread. We now have a benchmark repository where we compare the overhead from a normal django app, one instrumented with Sentry and one instrumented with OpenTelemetry. There is also a related video explaining the setup and some numbers.

There is no official collection of numbers yet but I can update this thread again when we do.

sl0thentr0py avatar Dec 20 '21 17:12 sl0thentr0py

In general it's very hard to put a specific number on the SDK's overhead, it really depends on the app. I'll close this for now but feel free to reopen or create a new issue if the overhead of Sentry on your specific app seems too high.

sentrivana avatar Aug 14 '23 12:08 sentrivana