raven-python icon indicating copy to clipboard operation
raven-python copied to clipboard

raven.utils.__init__.get_versions extremely slow (on aws)

Open borisnieuwenhuis opened this issue 8 years ago • 4 comments

we ran into a situation that this call takes 12-15 seconds in a Django app on first access of a log function after startup.

this leads to the situation that first hits to log.error or log.warn or anything that has the sentry handler configured are very very slow even if the default http handler is async.

I didnt dive any deeper, and adding 'include_versions': False in the RAVEN_CONFIG, disabled it, but it might be wise not to have this disabled by default or add a warning somewhere or initilaize this in the async part of the request logging if possible, might make it a bit better.

borisnieuwenhuis avatar Dec 08 '16 13:12 borisnieuwenhuis

I believe I'm experiencing the same issue. I have a large Django app with 10s of apps. I ran line profile on get_versions with my INSTALLED_APPS as the argument and I found this (get_version_from_app was: where all the time was being taken):

Total time: 32.966 s
File: /home/pete/.venvs/project/local/lib/python2.7/site-packages/raven/utils/__init__.py
Function: get_version_from_app at line 61

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    61                                           def get_version_from_app(module_name, app):
    62       142          137      1.0      0.0      version = None
    63                                           
    64                                               # Try to pull version from pkg_resource first
    65                                               # as it is able to detect version tagged with egg_info -b
    66       142          111      0.8      0.0      if pkg_resources is not None:
    67                                                   # pull version from pkg_resources if distro exists
    68       142           63      0.4      0.0          try:
    69       142     32963944 232140.5    100.0              return pkg_resources.get_distribution(module_name).version
    70       136          284      2.1      0.0          except Exception:
    71       136           72      0.5      0.0              pass
    72                                           
    73       136          310      2.3      0.0      if hasattr(app, 'get_version'):
    74         1            0      0.0      0.0          version = app.get_version
    75       135          146      1.1      0.0      elif hasattr(app, '__version__'):
    76         5            4      0.8      0.0          version = app.__version__
    77       130          154      1.2      0.0      elif hasattr(app, 'VERSION'):
    78         1            0      0.0      0.0          version = app.VERSION
    79       129          115      0.9      0.0      elif hasattr(app, 'version'):
    80                                                   version = app.version
    81                                           
    82       136          129      0.9      0.0      if callable(version):
    83         1           10     10.0      0.0          version = version()
    84                                           
    85       136          278      2.0      0.0      if not isinstance(version, (string_types, list, tuple)):
    86       129           65      0.5      0.0          version = None
    87                                           
    88       136           68      0.5      0.0      if version is None:
    89       129           48      0.4      0.0          return None
    90                                           
    91         7           10      1.4      0.0      if isinstance(version, (list, tuple)):
    92                                                   version = '.'.join(map(str, version))
    93                                           
    94         7            8      1.1      0.0      return str(version)

I haven't confirmed this is the cause of my problem in production but I believe it is at the moment. From a quick Google it seems that pkg_resources.get_distribution is known to be slow.

petedmarsh avatar Mar 14 '17 20:03 petedmarsh

Not really an solution to the problem, but if you are okay with dropping versions from your sentry logs, adding 'include_versions': False to RAVEN_CONFIG seems to do the trick for me on Django.

rafalp avatar Jun 25 '17 22:06 rafalp

Affects us as well, 'include_versions': False makes startup time go from 28 to 5 seconds.

HuffAndPuff avatar May 28 '18 16:05 HuffAndPuff

I haven't profiled this myself, but reading through the comments and code in get_version_from_app, it seems like pkg_resources.get_distributions is the slow part and it's getting called on everything in INSTALLED_APPS. get_versions_from_app seems to have a lot of shortcuts to avoid the pkg_resources.get_distributions but they get called only if the get_distributions can't be imported.

It seems like https://github.com/getsentry/raven-python/compare/master...AlexRiina:master might help.

AlexRiina avatar May 28 '18 19:05 AlexRiina