inspire-next
inspire-next copied to clipboard
global: correct logging
As part of #1370 we experimented with the current install of Sentry
and discovered how we should write logging messages so that Sentry
aggregates them into a single event without spamming everyone (that much).
We will be trying to use Sentry
as the main point of monitoring the health of the legacy migration and if it's not to spammy it would be cool to actually use it on QA.
TODOs:
- [x] Define Sentry Friendly Log messages
- [ ] Fix all logging messages to be Sentry Friendly
Examples of duplicated Sentry entries caused by improper logging:
- https://sentry.cern.ch/inspire-sentry/inspire-labs-qa/group/601703/
- https://sentry.cern.ch/inspire-sentry/inspire-labs-qa/group/601692/
- https://sentry.cern.ch/inspire-sentry/inspire-labs-qa/group/601849/
- https://sentry.cern.ch/inspire-sentry/inspire-labs-qa/group/601838/
- https://sentry.cern.ch/inspire-sentry/inspire-labs-qa/group/601837/
- [...]
Which logger to use:
Use per module loggers:
logger = logging.getLogger(__name__)
or, in Celery tasks
logger = get_task_logger(__name__)
Log in a Sentry Friendly way.
Exceptions
If you catch an exception and you just want to simply log it use this:
try:
failing_code()
except ExceptionClass:
logger.exception('Some message')
# treatment of exception
The logger wil automatically output the exception info so there's no need to be explicit about it
Errors
For making Sentry group your exceptions by the message you have to use the following convetion:
logger.error('I have an error %s %s', arg1, arg2)
Behind the scenes, the logger will do:
message % args
Things that do not work
logger.error('I have an error {} {}', arg1, arg2)
The logger wants to do msg % args
logger.error('I have an error {} {}'.format(arg1, arg2))
Sentry won't know how to group since it won't receive any pattern
logger.error('I have an error', arg1, arg2)
The logger won't do the msg % args
thing
@mihaibivol can you put this piece of documentation into the official docs?
@kaplun There are still some shortfalls in properly understanding how Sentry is initialized. So far It seems that we need to use current_app.logger.log('anything')
so that all the handlers are put on all singletons. This thing is not final and needs improvement in the part of understanding which logger
instance to use.
We can use pylint
to find all remaining bad uses:
$ pylint --disable=all --enable=E1200,E1201,E1205,E1206,W1201,W1202 inspirehep
I ran your command and don't see any error any longer. Not sure I understand fully what you meant then.
This is what you should see:
************* Module inspirehep.modules.disambiguation.logic
W:379, 8: Specify string format arguments as logging function parameters (logging-not-lazy)
************* Module inspirehep.modules.disambiguation.tasks
W: 96, 8: Specify string format arguments as logging function parameters (logging-not-lazy)
************* Module inspirehep.modules.migrator.tasks.records
W:115,20: Use % formatting in logging functions and pass the % parameters as arguments (logging-format-interpolation)
W:334,21: Use % formatting in logging functions and pass the % parameters as arguments (logging-format-interpolation)
But actually I realized that this doesn't cover the various obj.log
calls, so it's incomplete (and we don't have automatic tools for that).
Follow-up of @jmartinm suggestion, verifying which logger to use.
Info: @jmartinm's PR inveniosoftware/invenio-logging#30 integrates the SentryHandler
at the app.logger
.
For testing this, I've added a dummy SENTRY_DSN
at inspirehep.cfg
that simply changes the endpoint number (along with the other config entries here).
Running inspire-next
with different DEBUG
flag settings:
-
DEBUG = True
LOGGER.handlers []
current_app.logger.handlers [
<flask.logging.DebugHandler object at 0x10f9b53d0>,
<flask.logging.ProductionHandler object at 0x10f972f90>,
<raven.handlers.logging.SentryHandler object at 0x10f9e6fd0>
]
LOGGER.parent.handlers [
<flask.logging.DebugHandler object at 0x10f9b53d0>,
<flask.logging.ProductionHandler object at 0x10f972f90>,
<raven.handlers.logging.SentryHandler object at 0x10f9e6fd0>
]
LOGGER.propagate 1
current_app.logger.propagate False
current_app.logger.getEffectiveLevel 10
LOGGER.parent.getEffectiveLevel 10
-
DEBUG = False
LOGGER.handlers []
current_app.logger.handlers [
<flask.logging.DebugHandler object at 0x1143c13d0>,
<flask.logging.ProductionHandler object at 0x1143f2f90>,
<raven.handlers.logging.SentryHandler object at 0x10f9f98d0>
]
LOGGER.parent.handlers [
<flask.logging.DebugHandler object at 0x1143c13d0>,
<flask.logging.ProductionHandler object at 0x1143f2f90>,
<raven.handlers.logging.SentryHandler object at 0x10f9f98d0>
]
LOGGER.propagate 1
current_app.logger.propagate False
current_app.logger.getEffectiveLevel 30
LOGGER.parent.getEffectiveLevel 30
Note that:
-
propagate
is set toTrue
on Python logger. - The handlers for
current_app.logger
andLOGGER
are the exact same objects, in any case.
We can conclude that using either current_app.logger
or per module loggers, logging will be equivalent.
For having a uniform logging strategy, we should decide which one to use. (This will be the one recommended in the docs.)
Thanks for the analysis @chris-asl , in principle I would lean towards using the Flask
mechanism since it seems less verbose to do current_app.logger.{info,debug}
.
On the other hand, this implies that whenever you want to log you need to have an application context so this might not be the case in some situations and we will still need both mechanisms. Can you have a look at the cases where we use the default Python logger and see if there is an application context.
Also can you do the same comparison inside of a Celery task to see if you get the same results (since the application is slightly different) ?
Logging usage in Celery tasks
This method by @david-caro, ensures that a logger is assigned to a workflow object.
If we prove that any of the logger options (either current_app.logger
, or Python default logger or obj.log
) behave the same, then we can conclude that it's ok to choose whichever logging method.
1. DEBUG = True
I submitted a new record and then set a trace point after this line.
So, the aforementioned code was called before set_schema
method and the object did already have a logger.
The table below shows each one of the three loggers and traverses the logger hierarchy up, until it finds the root one.
logger | propagate | parent | handlers | ||
---|---|---|---|---|---|
obj.log | |||||
<flask.logging.DebugLogger object at 0x1174ae250> | False | <logging.RootLogger object at 0x110886e90> | [<flask.logging.DebugHandler object at 0x11852a1d0> | <flask.logging.ProductionHandler object at 0x1185bef50> | <raven.handlers.logging.SentryHandler object at 0x1185ce650>] |
<logging.RootLogger object at 0x110886e90> | 1 | None | [<logging.StreamHandler object at 0x110f18dd0> | <raven.handlers.logging.SentryHandler object at 0x117b85d10>] | |
current_app.logger | |||||
<flask.logging.DebugLogger object at 0x1174ae250> | False | <logging.RootLogger object at 0x110886e90> | [<flask.logging.DebugHandler object at 0x11852a1d0> | <flask.logging.ProductionHandler object at 0x1185bef50> | <raven.handlers.logging.SentryHandler object at 0x1185ce650>] |
<logging.RootLogger object at 0x110886e90> | 1 | None | [<logging.StreamHandler object at 0x110f18dd0> | <raven.handlers.logging.SentryHandler object at 0x117b85d10>] | |
LOGGER | |||||
<celery.utils.log.ProcessAwareLogger object at 0x11420b910> | 1 | <flask.logging.DebugLogger object at 0x1174ae250> | [] | ||
<flask.logging.DebugLogger object at 0x1174ae250> | False | <logging.RootLogger object at 0x110886e90> | [<flask.logging.DebugHandler object at 0x11852a1d0> | <flask.logging.ProductionHandler object at 0x1185bef50> | <raven.handlers.logging.SentryHandler object at 0x1185ce650>] |
<logging.RootLogger object at 0x110886e90> | 1 | None | [<logging.StreamHandler object at 0x110f18dd0> | <raven.handlers.logging.SentryHandler object at 0x117b85d10>] |
2. DEBUG = False
logger | propagate | parent | handlers | ||
---|---|---|---|---|---|
obj.log | |||||
<flask.logging.DebugLogger object at 0x110ccf990> | False | <logging.RootLogger object at 0x1099c6e90> | [<flask.logging.DebugHandler object at 0x11170e6d0> | <flask.logging.ProductionHandler object at 0x11170e790> | <raven.handlers.logging.SentryHandler object at 0x10d3d8ed0>] |
<logging.RootLogger object at 0x1099c6e90> | 1 | None | [<logging.StreamHandler object at 0x10a058dd0> | <raven.handlers.logging.SentryHandler object at 0x110ccfa50>] | |
current_app.logger | |||||
<flask.logging.DebugLogger object at 0x110ccf990> | False | <logging.RootLogger object at 0x1099c6e90> | [<flask.logging.DebugHandler object at 0x11170e6d0> | <flask.logging.ProductionHandler object at 0x11170e790> | <raven.handlers.logging.SentryHandler object at 0x10d3d8ed0>] |
<logging.RootLogger object at 0x1099c6e90> | 1 | None | [<logging.StreamHandler object at 0x10a058dd0> | <raven.handlers.logging.SentryHandler object at 0x110ccfa50>] | |
LOGGER | |||||
<celery.utils.log.ProcessAwareLogger object at 0x10d3d8910> | 1 | <flask.logging.DebugLogger object at 0x110ccf990> | [] | ||
<flask.logging.DebugLogger object at 0x110ccf990> | False | <logging.RootLogger object at 0x1099c6e90> | [<flask.logging.DebugHandler object at 0x11170e6d0> | <flask.logging.ProductionHandler object at 0x11170e790> | <raven.handlers.logging.SentryHandler object at 0x10d3d8ed0>] |
<logging.RootLogger object at 0x1099c6e90> | 1 | None | [<logging.StreamHandler object at 0x10a058dd0> | <raven.handlers.logging.SentryHandler object at 0x110ccfa50>] |
We can see that through none of the logger options, logger records can be transmitted through two SentryHandlers on their hierarchy.
So, probably there's another reason we're getting duplicate Sentry entries.
Examples of duplicate Sentry entries provided by @jacquerie here:
https://sentry.cern.ch/inspire-sentry/inspire-labs/group/824650/
https://sentry.cern.ch/inspire-sentry/inspire-labs/group/865674/
Additionally, we can see that obj.log
is in the two cases the Flask app logger.
For generating the table I used this (click).
header = "logger, propagate, parent, handlers\n"
def _logger_details(logger):
return str(logger) + ", " + str(logger.propagate) + ", " + str(logger.parent) + ", " \
+ str(logger.handlers) + "\n"
def _generate_logger_hierarchy_info(logger):
logger_info = ""
while logger is not None:
logger_info += _logger_details(logger)
logger = logger.parent
return logger_info
message = header
message += " # obj.log\n"
message += _generate_logger_hierarchy_info(obj.log)
message += " # current_app.logger\n"
message += _generate_logger_hierarchy_info(current_app.logger)
message += " # LOGGER\n"
message += _generate_logger_hierarchy_info(LOGGER)
Places we use the default Python logger
With ag getLogger
, I got the python logger usages below.
The ones with x
are the ones where I could from flask import current_app
and use the current_app.logger
.
-
[x]
modules/search/api.py 44:logger = logging.getLogger(__name__)
-
[x]
modules/workflows/tasks/submission.py 47:LOGGER = logging.getLogger(__name__)
Here, in thesend_robotupload
method, we're using bothLOGGER.debug
(after checkingif current_app.debug
, so also have thecurrent_app
in this context) and [obj.log.debug
] (https://github.com/inspirehep/inspire-next/blob/master/inspirehep/modules/workflows/tasks/submission.py#L231). So probably we can usecurrent_app.logger
(or even theobj.log
, since they are the same) here. -
[x]
modules/workflows/utils.py 40:LOGGER = logging.getLogger(__name__)
-
[x]
utils/normalizers.py 29:LOGGER = logging.getLogger(__name__)
-
[x]
utils/record_getter.py 37:LOGGER = logging.getLogger(__name__)
-
[ ]
cli.py 37:logging.getLogger('backoff').propagate = 0
In this context,repr(current_app) == (<LocalProxy unbound>)
andcurrent_app.logger
raises:
RuntimeError: Working outside of application context.
This typically means that you attempted to use functionality that needed
to interface with the current application object in a way. To solve
this set up an application context with app.app_context(). See the
documentation for more information.
Here, I tried the backoff-logger
and getLogger(__name__)
, and got this:
logger | propagate | parent | handlers |
---|---|---|---|
backoff-logger | |||
<celery.utils.log.ProcessAwareLogger object at 0x10869d710> | 0 | <logging.RootLogger object at 0x1057a3650> | [<logging.NullHandler object at 0x10869d810>] |
<logging.RootLogger object at 0x1057a3650> | 1 | None | [] |
LOGGER | |||
<celery.utils.log.ProcessAwareLogger object at 0x1085c5810> | 1 | <logging.RootLogger object at 0x1057a3650> | [] |
<logging.RootLogger object at 0x1057a3650> | 1 | None | [] |
-
[ ]
wsgi.py 38:logging.getLogger('backoff').propagate = 0
Same ascli.py
case. -
[ ]
celery.py 37:logging.getLogger('backoff').propagate = 0
Same ascli.py
case.
In general, do all workflow objects have the log
attribute? If so, we could be using this for logging in workflow tasks.
TODO it would be good to have links to duplicate entries here.
Here you are:
https://sentry.cern.ch/inspire-sentry/inspire-labs/group/824650/ https://sentry.cern.ch/inspire-sentry/inspire-labs/group/865674/
Thanks for this @jacquerie.
Following the links we can see that these errors are related to backoff
logging.
This is the only place where we handle logging a bit differently.
If the duplicate entries on Sentry are only related to backoff
, then it is ok to use whichever of the two logging ways in any other place.
@david-caro @jacquerie @jmartinm what do you think?
I would use the get_task_logger
on all the celery tasks (to add the extra info about the task and task-id) and python's general logger everywhere else, that way you don't really have to care if there's a context or not, it will do the right thing by itself, one thing less to keep in mind.
I don't really see how that solves (or explains) the duplicate entries though :/
Didn't know about get_task_logger
, seems like the appropriate solution for celery tasks.
I also like the idea of using python loggers.
If we reach to a consensus on this one, then I can add it to the documentation and also do the modifications on the code (where the current_app
logger is being used).
Hey @chris-asl! Due to the technical problems that sentry.cern.ch
is having, we switched in https://github.com/inspirehep/inspire-next/commit/1e00b6af4a5b56a0a8dcc6c72fcc032f8910076e to a recent version of raven
that supports Sentry 8, so that we can use https://sentry.io.
Honestly it's so good that I don't think we should go back to Sentry 6 even if sentry.cern.ch
gets repaired. This might require changes in the logger: can you investigate from https://docs.sentry.io/clients/python/integrations/logging/#usage what do they recommend us to do?