defoe
defoe copied to clipboard
Clean up and standardise logging
Clean up and standardise logging. Relates to #5.
defoe/run_query.py
does:
log = context._jvm.org.apache.log4j.LogManager.getLogger(__name__)
and passes this to queries via do_query
. These logs appear in the standard error when running spark-submit.
However, object model instances used to try:
from logging import getLogger
...
self.logger = getLogger('py4j')
These logs do not appear in the standard error when running spark-submit as the code is run on executor nodes. A lot of Googling didn't reveal a clear way to get the logs from the executors to the master.
run_query.py
's log
object cannot be passed into object model instances as this gives an error e.g.
Traceback (most recent call last):
File "/opt/cray/spark2/2.2_kubernetes.0000.201808240356_00
27/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 148,
in dump
return Pickler.dump(self, obj)
A hack that worked from anywhere in the code was to write to a file on a distributed filesystem (e.g. Lustre on Urika) to which all executors have access.
def log(message):
with open("/mnt/lustre/<user>/info.log", "a") as f:
f.write(message)
f.write("\n")
Building on this concept, is to configure Python's logging
, using a YAML file, as described in https://docs.python.org/2/library/logging.config.html. A simple way of doing this is to check for the existence of such a YAML file and, if present, load it and configure Python's logging
module, using a flag to indicate whether logging has been configured. This YAML file can be configured to use any Python file logger and, as above, write to a file on a distributed filesystem to which all executors have access.
The YAML file can be submitted to executors via a --files log.properties.yml
argument to spark-submit
.
Object model objects can use a logging_utils.get_logger
helper function to get a logger and, as a side-effect, configure logging if it has not already been configured.
The logging branch contains a simple implementation. See:
-
configs/log.properties.yml
-
defoe/logging_utils.py
-
defoe/papers/issue.py
-
docs/run-queries.md
(Configure logging)