dd-trace-java icon indicating copy to clipboard operation
dd-trace-java copied to clipboard

Agent log output in JSON?

Open dpratt opened this issue 5 years ago • 25 comments

We run all our services in docker containers in a k8s cluster, as I suspect quite a few Datadog users do, and as such we have our standard boilerplate set to emit all logs from all sources (channeled through slf4j) as properly formatted single line JSON. This allows for the Datadog agent to pick up and properly index logs and deal with things like stack traces in a sane fashion.

For the most part, this works quite well, except for log messages printed to standard out by the datadog java agent. Since the slf4j infrastructure at that level is loaded and configured in a different classloader, we don't really have any way to influence it's output - it appears to just be using the stock simple slf4j logger implementation, which only really allows for coarse grained configuration of log levels.

Is there a mechanism by which we could have the java agent use logback or some other slf4j binding that outputs JSON instead? We have people getting woken up at all hours of the night due to errors generating alerts because the DD log infrastructure interprets anything that can't be parsed as JSON at the ERROR level. We've mitigated a bunch of this by adding -Ddatadog.slf4j.simpleLogger.defaultLogLevel=WARN to our JVM launch parameters, but there's still a bunch of stuff that occasionally gets through.

dpratt avatar Nov 22 '19 05:11 dpratt

Apologies for the delay @dpratt. Can you open a support ticket so we can properly track this feature request? Thanks.

jeremy-lq avatar Dec 27 '19 16:12 jeremy-lq

I'm interested in this as well. May I point out that DD docs recommend logging in logstash json format but the same can't be accomplished easily with DD agent...

svetozar02 avatar Jan 08 '20 13:01 svetozar02

Is there upside that you might be able to share @dpratt ? I have same issue.

matt-canty-dragon avatar Apr 02 '20 13:04 matt-canty-dragon

@jeremy-lq has this been implemented? Is there another ticket/issue where this is being tracked?

raj-saxena avatar Jul 15 '20 15:07 raj-saxena

@svetozar02 cannot agree more with your statement. DD does a statement in the docs and completely contradicts it with its software

philippneugebauer avatar Aug 24 '21 11:08 philippneugebauer

Supporting development of this feature (which I see as a bug) - for the reasons others have described. Our monitoring team has to spend time looking at every alert, and these are fruitless ones. We are increasing the risk of alert fatigue as well.

asharma3007 avatar Nov 17 '21 10:11 asharma3007

This has become a huge issue for us enabling Datadog application security. Enabling it means you get potential logs like:

WARN com.datadog.appsec.gateway.AppSecRequestContext - xxx

You can clearly see in the log its a warning, in this case about a potential security issue but I quote from support case "It looks like these logs may be going to stderr, is that right? If so, we assign the ERROR status to all stderr logs by default (noted in the Important Notes here: https://docs.datadoghq.com/agent/docker/log/?tab=hostagent#installation )."

This means our applications are now flooded with error logs that we can't really do anything with.

robinglen avatar Jan 28 '22 10:01 robinglen

@thearegee I've seen support case, you are talking about. I'm sorry for that bad experience. There was an AppSec issue which cause spams WARN message for every requests. But now it's fixed and will be available in coming release (today). In addition, we are going to rework logging system, to make it more neat in future releases.

ValentinZakharov avatar Jan 28 '22 11:01 ValentinZakharov

@ValentinZakharov thank you thats great news and thanks for getting this in so quickly. We will upgrade this today and see how it looks. If we can help with logging please let me know

robinglen avatar Jan 28 '22 12:01 robinglen

Hi @ValentinZakharov ! Any update about the rework on logging system?

franciscosantana avatar Apr 18 '22 13:04 franciscosantana

Any update about the rework on logging system?

Any movement on this? 3 years later and we're still having to configure datadog agent to log to System.out so as not to spam fake errors to the datadog log... which is kinda funny/annoying.

jon-p avatar Jul 25 '22 12:07 jon-p

I'm sorry for the delay. Reworking of current logging system been postponed due critical impact on existing customers. However, we are trying to fix customer's issues with minor fixes. @jon-p If I understood the problem correctly, because of Java tracer sends messages into seterr, they are detected as an errors in DataDog dashboard, even if they are just messages. If I'm correct, then you can try to configure Java tracer logs to be written into dedicated log file with the option:

-Ddatadog.slf4j.simpleLogger.logFile=<NEW_LOG_FILE_PATH>

Ref: documentation In this case, I suppose, Agent should ignore messages from the Java tracer.

Please, let me know, if that way, can't solve the problem.

ValentinZakharov avatar Aug 02 '22 15:08 ValentinZakharov

The solution above doesn't really solve the problem, in that if you redirect the log messages to a file, they'll likely just be ignored since they're not going to stderr/stdout. Additionally, if you're running in a container, this is likely an anti-pattern since you'll be writing to an ephemeral file that will eventually just be discarded when your container exits. The real solution here would be to do one of two things -

  • Have dd-java-agent accept a configuration option (perhaps datadog.slf4j.logFormat?) that would allow you to change the default output to well-formatted JSON instead of raw text, which make it output logs in the way that Datadog itself reccomends as a best practice.
  • Add a default configuration to the datadog log pipelines that would detect messages of this specific format and have a default glob pattern that extracts the level, message and timestamp so that things 'just work' in the default out-of-the-box configuration. Note: I do not like this solution either.

dpratt avatar Aug 02 '22 16:08 dpratt

@dpratt thanks for sharing details of proposed solution. I've discussed details of the feature request with colleagues. Unfortunately, due to limited resources, new features are implemented in accordance with the priorities set by the product management team. Mostly priorities are depend on the number and size of customers. At the moment, logger improvements is placed in the backlog and has a low priority, so I cannot provide any time estimation for this feature.

ValentinZakharov avatar Aug 15 '22 12:08 ValentinZakharov

Just in case it helps anyone else -- Our office has also filed a feature request ticket with DD Support (892588) providing a link to this GitHub issue. No new timing or announcements are available, but we've added our voice in support of a solution.

bkmjaris avatar Aug 22 '22 17:08 bkmjaris

I'm adding our account to the list of affected customers who would like to see a resolution.

Given how hard it is to build dd-trace-java on our side, and the fact that the agent's classloader is fully locked, short of monkeypatching classfiles we have no way of getting properly structured logs out of the APM agent.

mrzor avatar Nov 18 '22 16:11 mrzor

We also stumbled about this in our trial phase. DataDog recommends JSON logging in its own docs, but then the agent logs get marked as errors in the DataDog web ui.

der-eismann avatar Feb 08 '23 12:02 der-eismann

Does setting the DD_LOG_FORMAT_JSON environment variable or applying the log_format_json datadog.yml configuration output the desired structured logs? These would be agent logs only (e.g. https://hub.docker.com/r/datadog/agent) -- it's up to individual services/applications to format their own logs. Afaik there's no "java" only agent that the OP is refering to?

https://github.com/DataDog/datadog-agent/blob/a3d970f0eb8359f7a4676768a91f0165e8203887/pkg/config/config_template.yaml#L2050-L2054

briceburg avatar Mar 07 '23 05:03 briceburg

@briceburg, I assume the above posters are talking about the dd-java-agent.jar that the documentation instructs you to download and run with your java app: https://docs.datadoghq.com/tracing/trace_collection/dd_libraries/java/#add-the-java-tracer-to-the-jvm Calling it the "java agent" seem reasonable to me, although might not be the correct term. And yes, I've tried setting that the DD_LOG_FORMAT_JSON=true environment variable, the yml config, and a few other things, all to no avail.

greg-p-slingshot avatar Mar 07 '23 18:03 greg-p-slingshot

OK yes, I struggle with this too across the various ddtrace implementions.

In an effort to "have all logs in JSON", I reduced the output of ddtrace to "WARNING" by setting the DD_TRACE_LOG_FILE_LEVEL=WARNING environment variable wherever it gets included (e.g. the dd-java-agent.jar [why that is not called dd-java-trace.jar is another question ;)]), and then datadog seems to do a good job in its log processors to properly tag the message; but yeah, super annoying its not in JSON.

briceburg avatar Mar 07 '23 19:03 briceburg

When can we expect this problem to be fixed? 😞

I have tried DD_TRACE_LOG_FILE_LEVEL=WARNING mentioned by @briceburg, but I still see info logs

jakubkalicki-tink avatar Jun 06 '23 08:06 jakubkalicki-tink

I was able to configure the java agent log messages to go to stdout and also get the messages correctly parsed by Datadog.

TL;DR

If you just want Datadog to stop identifying the agent logs messages as errors, add the following when running your application via java -jar:

-Ddatadog.slf4j.simpleLogger.logFile=System.out

If you also want these logs to get parsed by datadog, add the following:

-Ddatadog.slf4j.simpleLogger.dateTimeFormat="yyyy-MM-dd HH:mm:ss.SSS"

Explanation:

About logs going to stderr:

The default value for datadog.slf4j.simpleLogger.logFile is System.err, which causes logs go to stderr:

  • https://github.com/DataDog/dd-trace-java/blob/11e5dfa116caf6d49d7f674107bde55b2390c7db/dd-java-agent/agent-logging/src/main/java/datadog/trace/logging/simplelogger/SLCompatSettings.java#L64

Changing this property to use System.out fixes it.

About getting Datadog to parse the logs

The default for datadog.slf4j.simpleLogger.dateTimeFormat is '[dd.trace 'yyyy-MM-dd HH:mm:ss:SSS Z']':

  • https://github.com/DataDog/dd-trace-java/blob/11e5dfa116caf6d49d7f674107bde55b2390c7db/dd-java-agent/agent-bootstrap/src/main/java/datadog/trace/bootstrap/Agent.java#L73

The default date format causes the agent log messages not to be accepted by any of Datadog's default formats for parsing Java logs:

image

image

Simply changing the log date format to yyyy-MM-dd HH:mm:ss.SSS is enough for one of the default formats to parse the message.

image

gabrielhof avatar Jun 13 '23 21:06 gabrielhof

Another workaround (only for user that uses DataDog LogExplorer feature), is to create a new Grok processor in your DataDog log pipeline. You also can add new rules to existing default Java processors.

I added the same as the default Java and update the date pattern

Rule:

irregular_dd_format \[dd\.trace %{_irregular_date}\]\s+\[%{_thread_name}\]\s+%{_status}\s+%{_logger_name}\s*(%{_context}\s*)?- (?>%{word:dd.trace_id} %{word:dd.span_id} - )?%{data:message}((\n|\t)%{data:error.stack})?

Helper rule:

_irregular_date %{date("yyyy-MM-dd HH:mm:ss:SSS Z"):timestamp}
_duration %{integer:duration}
_thread_name %{notSpace:logger.thread_name}
_status %{word:level}
_logger_name %{notSpace:logger.name}
_context %{notSpace:logger.context}
_line %{integer:line}

ls-rein-martha avatar Jun 19 '23 16:06 ls-rein-martha

Thank you @gabrielhof , I think that the system property should not have quotes, or if it should, then you should quote the whole thing. At least for me in a Dockerfile the quotes just cause quotes to appear around the timestamp.

i.e., I think changing this:

-Ddatadog.slf4j.simpleLogger.dateTimeFormat="yyyy-MM-dd HH:mm:ss.SSS"

to

"-Ddatadog.slf4j.simpleLogger.dateTimeFormat=yyyy-MM-dd HH:mm:ss.SSS"

SJrX avatar Oct 13 '23 23:10 SJrX