overwatch
overwatch copied to clipboard
[Investigate] How to make logs more useful
When we run multiple workspaces, how do we make logs more explicit? One idea could be to add the current workspace to every log/printf line
Log4j was upgraded from 1.x to 2.x ("Log4j2") ~DBR 11.0
DBR 13.3
Driver config:
/databricks/spark/dbconf/log4j/driver/log4j2.xml
Executor config:
/databricks/spark/dbconf/log4j/executor/log4j2.xml
(both on the driver's host FS)
Log4j2 docs
from "Configuration with XML":
Log4j can be configured using two XML flavors; concise and strict.
It may be important to understand Log4j2's Automatic Configuration logical flow. TBD.
Because the logging configuration that ships with a DBR can change let's consider this approach to adding new elements to the config: "Initialize Log4j by Combining Configuration File with Programmatic Configuration
Sometimes you want to configure with a configuration file but do some additional programmatic configuration. A possible use case might be that you want to allow for a flexible configuration using XML but at the same time make sure there are a few configuration elements that are always present that can't be removed.
The easiest way to achieve this is to extend one of the standard Configuration classes (XmlConfiguration, JSONConfiguration) and then create a new ConfigurationFactory for the extended class. After the standard configuration completes the custom configuration can be added to it.
(See example shown there.)
This should be cleaner and easier to debug than replacing or appending to the built-in, file-based config.
There are some relevant code snippets in our internal wiki that I need to study. Here they are for convenience:
Using scala commands:
For package level :
%scala
import org.apache.logging.log4j.core.Logger
import org.apache.logging.log4j.core.LoggerContext
import org.apache.logging.log4j.Level
import org.apache.logging.log4j.LogManager
import org.apache.logging.log4j.util.StackLocatorUtil
import sys.process._
import scala.concurrent.duration._
// Include packages of interest
val packages = Seq("com.amazonaws","com.databricks.common.filesystem.LokiFileSystem","shaded.databricks.org.apache.hadoop.fs", "com.databricks.sql.managedcatalog")
// Change the logging level
val level = org.apache.logging.log4j.Level.TRACE
// Setting driver only
val context: org.apache.logging.log4j.core.LoggerContext = LogManager.getContext(false).asInstanceOf[org.apache.logging.log4j.core.LoggerContext]
val config = context.getConfiguration()
for(p<- packages) {
config.addLogger(p,new org.apache.logging.log4j.core.config.LoggerConfig(p,level,true))
config.getLoggers().get(p).setLevel(level)
}
context.updateLoggers()
// Setting executors
for(loggerPackage<- packages) {
sc.runOnEachExecutor[Unit](() => {
val loggerName = loggerPackage
val log4jLogger = LogManager.getLogger(loggerName).asInstanceOf[Logger]
val loggerContext = LogManager.getContext(StackLocatorUtil.getCallerClassLoader(3), false).asInstanceOf[LoggerContext]
val config = loggerContext.getConfiguration()
val loggerConfig = config.getLoggerConfig(loggerName)
loggerConfig.setLevel(level)
loggerContext.updateLoggers()
}, 5.seconds)
}
For root logger level :
%scala
import org.apache.logging.log4j.Level
import org.apache.logging.log4j.LogManager
import org.apache.logging.log4j.core.config.Configurator
val level = org.apache.logging.log4j.Level.DEBUG // Change as needed
Configurator.setAllLevels(LogManager.getRootLogger().getName(), level);
My next move is to compare these with the examples I linked to in the previous comment ☝ to understand the relevant APIs better.