emissary icon indicating copy to clipboard operation
emissary copied to clipboard

Enable configuring ambassador STDOUT logs as JSON format

Open davidalpert opened this issue 4 years ago • 10 comments

Please describe your use case / problem.

After recently setting the envoy_log_type: json option as part of enabling Datadog APM Tracing most access logs are written as JSON. It appears however that many log messages are still written to STDOUT as text.

This was unexpected and makes it significantly more complicated to define a log pipeline configuration which can parse all the different log message formats.

Describe the solution you'd like A single option for configuring log format of these ambassador pods and the ability to configure all pod logs to be JSON would be preferred.

Describe alternatives you've considered As a workaround I will experiment with additional logging configuration to support the multiple patterns coming from a single pod.

Additional context Here are some examples of log lines which I was surprised came in as text and not JSON after configuring envoy_log_type: json. They were intermingled with JSON access log message and didn't parse.

2020-10-09 14:37:24 diagd 1.5.1 [P383TThreadPoolExecutor-0_8] INFO: 9B8F4961-FB07-4012-A5E9-484B8A520C3D: <ipaddress> "GET /metrics" 124ms 200 success
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/wire_format_lite.cc:578] String field 'google.protobuf.Value.string_value' contains invalid UTF-8 data when serializing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes.
E1008 21:09:38.924448     390 reflector.go:123] k8s.io/[email protected]/tools/cache/reflector.go:96: Failed to list <nil>: Get https://<ipaddress>:443/apis/getambassador.io/v2/namespaces/<namespace>/kubernetesendpointresolvers?limit=500&resourceVersion=0: dial tcp <ipaddress>:443: connect: no route to host
Trace[934878924]: [30.000552118s] [30.000552118s] END

davidalpert avatar Oct 09 '20 14:10 davidalpert

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Dec 11 '20 07:12 stale[bot]

afaik this is still an issue

davidalpert avatar Dec 21 '20 23:12 davidalpert

looks like error log and many other logs still use text format

clguo-tw avatar Jan 13 '21 08:01 clguo-tw

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 19 '21 23:03 stale[bot]

if this issue has not yet been fixed, then it is not stale; it is active.

davidalpert avatar Apr 01 '21 17:04 davidalpert

FWIW, we send all our logs to Datadog. If the logs are NOT JSON formatted, then Datadog treats them as level=ERROR, and it is not ideal to see thousands of error logs from Ambassador every hour.

We have done the following to help mitigate a lot of the noisy "error" logs from Ambassador.

  • Use envoy_log_type: json in our ambassador module configuration
  • Create a custom Datadog log pipeline for the diagd log messages. Using a grok parser the one below did the trick for us.
aes.diagd %{date("yyyy-MM-dd HH:mm:ss"):date} diagd %{regex("[0-9.]*"):version} \[%{data:process_thread}\] %{word:level}: %{data:message}

We still have quite a few "snapshot" activity logs that we would also like to format as JSON. Creating a custom log pipeline in Datadog would not be ideal, because there is not much of a pattern for these messages and there is no log level in them. (And we'd rather treat the disease, not the symptom).

Examples:

PID 160, 0.06Gi (exited): envoy --config-path /ambassador/snapshots/econf-tmp.json --mode validate 

PID 37, 0.07Gi: /usr/bin/python /usr/bin/diagd /ambassador/snapshots /ambassador/bootstrap-ads.json /ambassador/envoy/envoy.json --notices /ambassador/notices.json --port 8004 --kick kill -HUP 1

PID 19, 0.14Gi: /ambassador/sidecars/amb-sidecar

2021/04/19 18:40:13 Memory Usage 1.63Gi (41%)

2021/04/19 18:40:01 Taking K8sSecret snapshot.SecretRef{Namespace:"ambassador", Name:"fallback-self-signed-cert"}

jtyberg avatar Apr 19 '21 19:04 jtyberg

We ran into this exact situation and it is making our logs look terrible and can trigger alerts. Is there any movement on fixing this? I feel like this should be a critical bug.

FriedCircuits avatar May 18 '21 17:05 FriedCircuits

I think there are 2 bandaid solutions that might be relevant here:

  1. Set AES_LOG_LEVEL: error in the container's env variables (https://www.getambassador.io/docs/edge-stack/latest/topics/running/environment/). Assuming you're running later than 1.5.4, this will minimize the vast majority of logs produced by the Ambassador control plane.

  2. Set a separate envoy log path from stdout using the Module (https://www.getambassador.io/docs/edge-stack/latest/topics/running/ambassador/#envoy). Then, you can set datadog to grab from your envoy log path for access logs, and then do something completely separate for stdout, which will now only contain control plane logs.

I'm admittedly pretty ignorant with respect to how datadog handles log gathering, so I'm not sure if # 2 will actually work the way I envision, so maybe someone with more experience might be able to expand on that idea.

cakuros avatar May 18 '21 22:05 cakuros

I think 2nd option can work but the file has be mounted into the agent container. I am going to try option 1. All we really need is error logs and envoy traffic that we can link with traces.

For anyone else it seems with AES_LOG_LEVEL: error you still get memory usage logs.

Thanks for everyone's help. And hope this helps the OP too.

FriedCircuits avatar May 19 '21 01:05 FriedCircuits

I worked on https://github.com/datawire/ambassador/pull/3437 recently and was able to address JSON logging for most of the Ambassador control plane code. Unfortunately, it still doesn't cover the logging emitted by our Kubernetes client-go dependencies and it probably won't cover the protobuf CPP warning/error messages. I think those might be coming from within Envoy... I'd need to dig deeper.

This change should however improve life significantly for those that want "almost all JSON logs" and should reduce the noise to a manageable minimum until we can track down and address the remaining rogue logs.

esmet avatar May 20 '21 13:05 esmet