overwatch icon indicating copy to clipboard operation
overwatch copied to clipboard

[Investigate] How to make logs more useful

Open gueniai opened this issue 1 year ago • 3 comments

When we run multiple workspaces, how do we make logs more explicit? One idea could be to add the current workspace to every log/printf line

gueniai avatar Mar 20 '24 13:03 gueniai

Log4j was upgraded from 1.x to 2.x ("Log4j2") ~DBR 11.0

DBR 13.3

Driver config: /databricks/spark/dbconf/log4j/driver/log4j2.xml

Executor config: /databricks/spark/dbconf/log4j/executor/log4j2.xml

(both on the driver's host FS)

Log4j2 docs

from "Configuration with XML":

Log4j can be configured using two XML flavors; concise and strict.

It may be important to understand Log4j2's Automatic Configuration logical flow. TBD.

neilbest-db avatar Mar 26 '24 22:03 neilbest-db

Because the logging configuration that ships with a DBR can change let's consider this approach to adding new elements to the config: "Initialize Log4j by Combining Configuration File with Programmatic Configuration

Sometimes you want to configure with a configuration file but do some additional programmatic configuration. A possible use case might be that you want to allow for a flexible configuration using XML but at the same time make sure there are a few configuration elements that are always present that can't be removed.

The easiest way to achieve this is to extend one of the standard Configuration classes (XmlConfiguration, JSONConfiguration) and then create a new ConfigurationFactory for the extended class. After the standard configuration completes the custom configuration can be added to it.

(See example shown there.)

This should be cleaner and easier to debug than replacing or appending to the built-in, file-based config.

neilbest-db avatar Mar 27 '24 20:03 neilbest-db

There are some relevant code snippets in our internal wiki that I need to study. Here they are for convenience:

Using scala commands:

For package level :

%scala

import org.apache.logging.log4j.core.Logger
import org.apache.logging.log4j.core.LoggerContext
import org.apache.logging.log4j.Level
import org.apache.logging.log4j.LogManager
import org.apache.logging.log4j.util.StackLocatorUtil
import sys.process._
import scala.concurrent.duration._
// Include packages of interest
val packages = Seq("com.amazonaws","com.databricks.common.filesystem.LokiFileSystem","shaded.databricks.org.apache.hadoop.fs", "com.databricks.sql.managedcatalog")
// Change the logging level
val level = org.apache.logging.log4j.Level.TRACE 
// Setting driver only
val context: org.apache.logging.log4j.core.LoggerContext = LogManager.getContext(false).asInstanceOf[org.apache.logging.log4j.core.LoggerContext]
val config =   context.getConfiguration()
for(p<- packages) {
  config.addLogger(p,new org.apache.logging.log4j.core.config.LoggerConfig(p,level,true))
  config.getLoggers().get(p).setLevel(level)
}
context.updateLoggers()
// Setting executors
for(loggerPackage<- packages) {
  sc.runOnEachExecutor[Unit](() => {
    val loggerName = loggerPackage
    val log4jLogger = LogManager.getLogger(loggerName).asInstanceOf[Logger]
    val loggerContext =  LogManager.getContext(StackLocatorUtil.getCallerClassLoader(3), false).asInstanceOf[LoggerContext]
    val config = loggerContext.getConfiguration()  
    val loggerConfig = config.getLoggerConfig(loggerName)
    loggerConfig.setLevel(level)
    loggerContext.updateLoggers()  
  }, 5.seconds)
}

For root logger level :

%scala

import org.apache.logging.log4j.Level
import org.apache.logging.log4j.LogManager
import  org.apache.logging.log4j.core.config.Configurator
val level = org.apache.logging.log4j.Level.DEBUG // Change as needed
Configurator.setAllLevels(LogManager.getRootLogger().getName(), level);

My next move is to compare these with the examples I linked to in the previous comment ☝ to understand the relevant APIs better.

neilbest-db avatar Apr 03 '24 18:04 neilbest-db