inspire-next icon indicating copy to clipboard operation
inspire-next copied to clipboard

global: correct logging

Open mihaibivol opened this issue 8 years ago • 15 comments

As part of #1370 we experimented with the current install of Sentry and discovered how we should write logging messages so that Sentry aggregates them into a single event without spamming everyone (that much).

We will be trying to use Sentry as the main point of monitoring the health of the legacy migration and if it's not to spammy it would be cool to actually use it on QA.

TODOs:

  • [x] Define Sentry Friendly Log messages
  • [ ] Fix all logging messages to be Sentry Friendly

mihaibivol avatar Jul 27 '16 13:07 mihaibivol

Examples of duplicated Sentry entries caused by improper logging:

  • https://sentry.cern.ch/inspire-sentry/inspire-labs-qa/group/601703/
  • https://sentry.cern.ch/inspire-sentry/inspire-labs-qa/group/601692/
  • https://sentry.cern.ch/inspire-sentry/inspire-labs-qa/group/601849/
  • https://sentry.cern.ch/inspire-sentry/inspire-labs-qa/group/601838/
  • https://sentry.cern.ch/inspire-sentry/inspire-labs-qa/group/601837/
  • [...]

jacquerie avatar Aug 02 '16 20:08 jacquerie

Which logger to use:

Use per module loggers:

logger = logging.getLogger(__name__)

or, in Celery tasks

logger = get_task_logger(__name__)

Log in a Sentry Friendly way.

Exceptions

If you catch an exception and you just want to simply log it use this:

try:
   failing_code()
except ExceptionClass:
    logger.exception('Some message')
    # treatment of exception

The logger wil automatically output the exception info so there's no need to be explicit about it

Errors

For making Sentry group your exceptions by the message you have to use the following convetion:

logger.error('I have an error %s %s', arg1, arg2)

Behind the scenes, the logger will do:

message % args

Things that do not work

logger.error('I have an error {} {}', arg1, arg2)

The logger wants to do msg % args

logger.error('I have an error {} {}'.format(arg1, arg2))

Sentry won't know how to group since it won't receive any pattern

logger.error('I have an error', arg1, arg2)

The logger won't do the msg % args thing

mihaibivol avatar Aug 04 '16 15:08 mihaibivol

@mihaibivol can you put this piece of documentation into the official docs?

kaplun avatar Aug 09 '16 08:08 kaplun

@kaplun There are still some shortfalls in properly understanding how Sentry is initialized. So far It seems that we need to use current_app.logger.log('anything') so that all the handlers are put on all singletons. This thing is not final and needs improvement in the part of understanding which logger instance to use.

mihaibivol avatar Aug 09 '16 09:08 mihaibivol

We can use pylint to find all remaining bad uses:

$ pylint --disable=all --enable=E1200,E1201,E1205,E1206,W1201,W1202 inspirehep

jacquerie avatar Apr 17 '17 09:04 jacquerie

I ran your command and don't see any error any longer. Not sure I understand fully what you meant then.

kaplun avatar Apr 19 '17 13:04 kaplun

This is what you should see:

************* Module inspirehep.modules.disambiguation.logic
W:379, 8: Specify string format arguments as logging function parameters (logging-not-lazy)
************* Module inspirehep.modules.disambiguation.tasks
W: 96, 8: Specify string format arguments as logging function parameters (logging-not-lazy)
************* Module inspirehep.modules.migrator.tasks.records
W:115,20: Use % formatting in logging functions and pass the % parameters as arguments (logging-format-interpolation)
W:334,21: Use % formatting in logging functions and pass the % parameters as arguments (logging-format-interpolation)

But actually I realized that this doesn't cover the various obj.log calls, so it's incomplete (and we don't have automatic tools for that).

jacquerie avatar Apr 19 '17 15:04 jacquerie

Follow-up of @jmartinm suggestion, verifying which logger to use.

Info: @jmartinm's PR inveniosoftware/invenio-logging#30 integrates the SentryHandler at the app.logger.

For testing this, I've added a dummy SENTRY_DSN at inspirehep.cfg that simply changes the endpoint number (along with the other config entries here).

Running inspire-next with different DEBUG flag settings:

  1. DEBUG = True
LOGGER.handlers []
current_app.logger.handlers [
  <flask.logging.DebugHandler object at 0x10f9b53d0>,
  <flask.logging.ProductionHandler object at 0x10f972f90>,
  <raven.handlers.logging.SentryHandler object at 0x10f9e6fd0>
]
LOGGER.parent.handlers [
  <flask.logging.DebugHandler object at 0x10f9b53d0>,
  <flask.logging.ProductionHandler object at 0x10f972f90>,
  <raven.handlers.logging.SentryHandler object at 0x10f9e6fd0>
]
LOGGER.propagate 1
current_app.logger.propagate False
current_app.logger.getEffectiveLevel 10
LOGGER.parent.getEffectiveLevel 10
  1. DEBUG = False
LOGGER.handlers []
current_app.logger.handlers [
  <flask.logging.DebugHandler object at 0x1143c13d0>,
  <flask.logging.ProductionHandler object at 0x1143f2f90>,
  <raven.handlers.logging.SentryHandler object at 0x10f9f98d0>
]
LOGGER.parent.handlers [
  <flask.logging.DebugHandler object at 0x1143c13d0>,
  <flask.logging.ProductionHandler object at 0x1143f2f90>,
  <raven.handlers.logging.SentryHandler object at 0x10f9f98d0>
]
LOGGER.propagate 1
current_app.logger.propagate False
current_app.logger.getEffectiveLevel 30
LOGGER.parent.getEffectiveLevel 30

Note that:

  • propagate is set to True on Python logger.
  • The handlers for current_app.logger and LOGGER are the exact same objects, in any case.

We can conclude that using either current_app.logger or per module loggers, logging will be equivalent.

For having a uniform logging strategy, we should decide which one to use. (This will be the one recommended in the docs.)

chris-asl avatar Sep 29 '17 12:09 chris-asl

Thanks for the analysis @chris-asl , in principle I would lean towards using the Flask mechanism since it seems less verbose to do current_app.logger.{info,debug}.

On the other hand, this implies that whenever you want to log you need to have an application context so this might not be the case in some situations and we will still need both mechanisms. Can you have a look at the cases where we use the default Python logger and see if there is an application context.

Also can you do the same comparison inside of a Celery task to see if you get the same results (since the application is slightly different) ?

jmartinm avatar Sep 29 '17 16:09 jmartinm

Logging usage in Celery tasks

This method by @david-caro, ensures that a logger is assigned to a workflow object. If we prove that any of the logger options (either current_app.logger, or Python default logger or obj.log) behave the same, then we can conclude that it's ok to choose whichever logging method.

1. DEBUG = True

I submitted a new record and then set a trace point after this line. So, the aforementioned code was called before set_schema method and the object did already have a logger.

The table below shows each one of the three loggers and traverses the logger hierarchy up, until it finds the root one.

logger propagate parent handlers
obj.log
<flask.logging.DebugLogger object at 0x1174ae250> False <logging.RootLogger object at 0x110886e90> [<flask.logging.DebugHandler object at 0x11852a1d0> <flask.logging.ProductionHandler object at 0x1185bef50> <raven.handlers.logging.SentryHandler object at 0x1185ce650>]
<logging.RootLogger object at 0x110886e90> 1 None [<logging.StreamHandler object at 0x110f18dd0> <raven.handlers.logging.SentryHandler object at 0x117b85d10>]
current_app.logger
<flask.logging.DebugLogger object at 0x1174ae250> False <logging.RootLogger object at 0x110886e90> [<flask.logging.DebugHandler object at 0x11852a1d0> <flask.logging.ProductionHandler object at 0x1185bef50> <raven.handlers.logging.SentryHandler object at 0x1185ce650>]
<logging.RootLogger object at 0x110886e90> 1 None [<logging.StreamHandler object at 0x110f18dd0> <raven.handlers.logging.SentryHandler object at 0x117b85d10>]
LOGGER
<celery.utils.log.ProcessAwareLogger object at 0x11420b910> 1 <flask.logging.DebugLogger object at 0x1174ae250> []
<flask.logging.DebugLogger object at 0x1174ae250> False <logging.RootLogger object at 0x110886e90> [<flask.logging.DebugHandler object at 0x11852a1d0> <flask.logging.ProductionHandler object at 0x1185bef50> <raven.handlers.logging.SentryHandler object at 0x1185ce650>]
<logging.RootLogger object at 0x110886e90> 1 None [<logging.StreamHandler object at 0x110f18dd0> <raven.handlers.logging.SentryHandler object at 0x117b85d10>]

2. DEBUG = False

logger propagate parent handlers
obj.log
<flask.logging.DebugLogger object at 0x110ccf990> False <logging.RootLogger object at 0x1099c6e90> [<flask.logging.DebugHandler object at 0x11170e6d0> <flask.logging.ProductionHandler object at 0x11170e790> <raven.handlers.logging.SentryHandler object at 0x10d3d8ed0>]
<logging.RootLogger object at 0x1099c6e90> 1 None [<logging.StreamHandler object at 0x10a058dd0> <raven.handlers.logging.SentryHandler object at 0x110ccfa50>]
current_app.logger
<flask.logging.DebugLogger object at 0x110ccf990> False <logging.RootLogger object at 0x1099c6e90> [<flask.logging.DebugHandler object at 0x11170e6d0> <flask.logging.ProductionHandler object at 0x11170e790> <raven.handlers.logging.SentryHandler object at 0x10d3d8ed0>]
<logging.RootLogger object at 0x1099c6e90> 1 None [<logging.StreamHandler object at 0x10a058dd0> <raven.handlers.logging.SentryHandler object at 0x110ccfa50>]
LOGGER
<celery.utils.log.ProcessAwareLogger object at 0x10d3d8910> 1 <flask.logging.DebugLogger object at 0x110ccf990> []
<flask.logging.DebugLogger object at 0x110ccf990> False <logging.RootLogger object at 0x1099c6e90> [<flask.logging.DebugHandler object at 0x11170e6d0> <flask.logging.ProductionHandler object at 0x11170e790> <raven.handlers.logging.SentryHandler object at 0x10d3d8ed0>]
<logging.RootLogger object at 0x1099c6e90> 1 None [<logging.StreamHandler object at 0x10a058dd0> <raven.handlers.logging.SentryHandler object at 0x110ccfa50>]

We can see that through none of the logger options, logger records can be transmitted through two SentryHandlers on their hierarchy. So, probably there's another reason we're getting duplicate Sentry entries. Examples of duplicate Sentry entries provided by @jacquerie here:
https://sentry.cern.ch/inspire-sentry/inspire-labs/group/824650/
https://sentry.cern.ch/inspire-sentry/inspire-labs/group/865674/

Additionally, we can see that obj.log is in the two cases the Flask app logger.

For generating the table I used this (click).

header = "logger, propagate, parent, handlers\n"

def _logger_details(logger):
    return str(logger) + ", " + str(logger.propagate) + ", " + str(logger.parent) + ", " \
           + str(logger.handlers) + "\n"

def _generate_logger_hierarchy_info(logger):
    logger_info = ""
    while logger is not None:
        logger_info += _logger_details(logger)
        logger = logger.parent
    return logger_info

message = header
message += " # obj.log\n"
message += _generate_logger_hierarchy_info(obj.log)
message += " # current_app.logger\n"
message += _generate_logger_hierarchy_info(current_app.logger)
message += " # LOGGER\n"
message += _generate_logger_hierarchy_info(LOGGER)

Places we use the default Python logger

With ag getLogger, I got the python logger usages below. The ones with x are the ones where I could from flask import current_app and use the current_app.logger.

  • [x] modules/search/api.py 44:logger = logging.getLogger(__name__)

  • [x] modules/workflows/tasks/submission.py 47:LOGGER = logging.getLogger(__name__) Here, in the send_robotupload method, we're using both LOGGER.debug (after checking if current_app.debug, so also have the current_app in this context) and [obj.log.debug] (https://github.com/inspirehep/inspire-next/blob/master/inspirehep/modules/workflows/tasks/submission.py#L231). So probably we can use current_app.logger (or even the obj.log, since they are the same) here.

  • [x] modules/workflows/utils.py 40:LOGGER = logging.getLogger(__name__)

  • [x] utils/normalizers.py 29:LOGGER = logging.getLogger(__name__)

  • [x] utils/record_getter.py 37:LOGGER = logging.getLogger(__name__)

  • [ ] cli.py 37:logging.getLogger('backoff').propagate = 0 In this context, repr(current_app) == (<LocalProxy unbound>) and current_app.logger raises:

RuntimeError: Working outside of application context.

This typically means that you attempted to use functionality that needed
to interface with the current application object in a way.  To solve
this set up an application context with app.app_context().  See the
documentation for more information.

Here, I tried the backoff-logger and getLogger(__name__), and got this:

logger propagate parent handlers
backoff-logger
<celery.utils.log.ProcessAwareLogger object at 0x10869d710> 0 <logging.RootLogger object at 0x1057a3650> [<logging.NullHandler object at 0x10869d810>]
<logging.RootLogger object at 0x1057a3650> 1 None []
LOGGER
<celery.utils.log.ProcessAwareLogger object at 0x1085c5810> 1 <logging.RootLogger object at 0x1057a3650> []
<logging.RootLogger object at 0x1057a3650> 1 None []
  • [ ] wsgi.py 38:logging.getLogger('backoff').propagate = 0 Same as cli.py case.

  • [ ] celery.py 37:logging.getLogger('backoff').propagate = 0 Same as cli.py case.

In general, do all workflow objects have the log attribute? If so, we could be using this for logging in workflow tasks.

chris-asl avatar Oct 04 '17 14:10 chris-asl

TODO it would be good to have links to duplicate entries here.

Here you are:

https://sentry.cern.ch/inspire-sentry/inspire-labs/group/824650/ https://sentry.cern.ch/inspire-sentry/inspire-labs/group/865674/

jacquerie avatar Oct 04 '17 17:10 jacquerie

Thanks for this @jacquerie.

Following the links we can see that these errors are related to backoff logging.
This is the only place where we handle logging a bit differently.

If the duplicate entries on Sentry are only related to backoff, then it is ok to use whichever of the two logging ways in any other place.

@david-caro @jacquerie @jmartinm what do you think?

chris-asl avatar Oct 09 '17 16:10 chris-asl

I would use the get_task_logger on all the celery tasks (to add the extra info about the task and task-id) and python's general logger everywhere else, that way you don't really have to care if there's a context or not, it will do the right thing by itself, one thing less to keep in mind.

I don't really see how that solves (or explains) the duplicate entries though :/

david-caro avatar Oct 10 '17 08:10 david-caro

Didn't know about get_task_logger, seems like the appropriate solution for celery tasks. I also like the idea of using python loggers.

If we reach to a consensus on this one, then I can add it to the documentation and also do the modifications on the code (where the current_app logger is being used).

chris-asl avatar Oct 19 '17 11:10 chris-asl

Hey @chris-asl! Due to the technical problems that sentry.cern.ch is having, we switched in https://github.com/inspirehep/inspire-next/commit/1e00b6af4a5b56a0a8dcc6c72fcc032f8910076e to a recent version of raven that supports Sentry 8, so that we can use https://sentry.io.

Honestly it's so good that I don't think we should go back to Sentry 6 even if sentry.cern.ch gets repaired. This might require changes in the logger: can you investigate from https://docs.sentry.io/clients/python/integrations/logging/#usage what do they recommend us to do?

jacquerie avatar Oct 26 '17 08:10 jacquerie