defoe icon indicating copy to clipboard operation
defoe copied to clipboard

Clean up and standardise logging

Open mikej888 opened this issue 6 years ago • 1 comments

Clean up and standardise logging. Relates to #5.

mikej888 avatar Jan 15 '19 14:01 mikej888

defoe/run_query.py does:

log = context._jvm.org.apache.log4j.LogManager.getLogger(__name__)

and passes this to queries via do_query. These logs appear in the standard error when running spark-submit.

However, object model instances used to try:

from logging import getLogger

    ...
    self.logger = getLogger('py4j')

These logs do not appear in the standard error when running spark-submit as the code is run on executor nodes. A lot of Googling didn't reveal a clear way to get the logs from the executors to the master.

run_query.py's log object cannot be passed into object model instances as this gives an error e.g.

Traceback (most recent call last):
  File "/opt/cray/spark2/2.2_kubernetes.0000.201808240356_00
27/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 148,
 in dump
    return Pickler.dump(self, obj)

A hack that worked from anywhere in the code was to write to a file on a distributed filesystem (e.g. Lustre on Urika) to which all executors have access.

def log(message):
    with open("/mnt/lustre/<user>/info.log", "a") as f:
        f.write(message)
        f.write("\n")

Building on this concept, is to configure Python's logging, using a YAML file, as described in https://docs.python.org/2/library/logging.config.html. A simple way of doing this is to check for the existence of such a YAML file and, if present, load it and configure Python's logging module, using a flag to indicate whether logging has been configured. This YAML file can be configured to use any Python file logger and, as above, write to a file on a distributed filesystem to which all executors have access.

The YAML file can be submitted to executors via a --files log.properties.yml argument to spark-submit.

Object model objects can use a logging_utils.get_logger helper function to get a logger and, as a side-effect, configure logging if it has not already been configured.

The logging branch contains a simple implementation. See:

  • configs/log.properties.yml
  • defoe/logging_utils.py
  • defoe/papers/issue.py
  • docs/run-queries.md (Configure logging)

mikej888 avatar Jan 26 '19 14:01 mikej888