descheduler Allow Logging format to be changed

Is your feature request related to a problem? Please describe.

Yes it is related to a problem. When sending the logs to Datadog, all logs show as errors because I cannot switch them to a format that Datadog excepts. This means that all logs are errors even if they are just info level logs.

Describe the solution you'd like

The ability to change the format of the logs.

Describe alternatives you've considered

none

What version of descheduler are you using?

descheduler version: Helm 0.24.1

Additional context

Sep 27 '22 17:09 gallowaystorm

Hi @gallowaystorm, is the --logging-format flag possibly helpful? That supports text and json logging output.

Sep 27 '22 18:09 damemi

How can I use that with the Helm chart?

Sep 27 '22 19:09 gallowaystorm

helm install ...... --set cmdOptions.logging-format=json

@gallowaystorm This should get you going

❯ helm install test . --dry-run --set cmdOptions.logging-format=json | grep -A2 logging
                - "--logging-format"
                - "json"
                - "--v"

Sep 28 '22 06:09 harshanarayana

Thank you I will try this!

Sep 29 '22 14:09 gallowaystorm

@harshanarayana what is the default setting for this

Sep 29 '22 14:09 gallowaystorm

Also that did not help the issue in Datadog. It did change the log format however

Sep 29 '22 14:09 gallowaystorm

@gallowaystorm could you give more details on what the issue in Datadog is? Some sample output or screenshots would help, I don't know if they are looking for a specific format or what that format you need is

Sep 29 '22 14:09 damemi

Sure here is the picture of our DD when on the "text" format:

descheduler

Here is it when I switched to JSON format:

descheduler-json

DD usually expects JSON, however it seems you aren't sending the log level on the JSON format. They are all coming in as errors on DD because of this

Sep 29 '22 14:09 gallowaystorm

We just use the klog library to set our logs, which should also handle the json formatting, so I'm not sure why it wouldn't be passing the severity.

Could you try setting --skip-headers as well? I found this issue and it looks like maybe the klog headers are in your logs... still don't see the severity in the JSON format, but this could be a start

Sep 29 '22 18:09 damemi

@damemi Might be related to the discussion on https://github.com/kubernetes-sigs/descheduler/issues/811 ? At least the stderr part seem related.

Sep 29 '22 19:09 harshanarayana

@harshanarayana ah you may be right, I bumped that issue and will try to figure out what still needs to be done for it

Oct 03 '22 20:10 damemi

@damemi I tried to set the skip headers via our helm release in Terraform (how we do helm releases) and it does not seem possible to do so:

resource "helm_release" "descheduler" {
  chart      = local.service_name
  name       = "${local.service_name}-${var.environment}"
  namespace  = local.namespace
  repository = "https://kubernetes-sigs.github.io/descheduler/"
  version    = "0.24.1" # APP VERSION v0.24.1

  set {
    name  = "cmdOptions.skip-headers"
    value = true
  }
}

Log from DD:

Error: unknown command "true" for "descheduler"

https://github.com/kubernetes-sigs/descheduler/blob/0317be1b7672c511f5e2e53acaea1dd5cb74fd77/docs/user-guide.md#cli-options

Seems like that flag does not allow a Boolean

Oct 04 '22 18:10 gallowaystorm

@gallowaystorm I think the problem is actually closer to https://github.com/kubernetes-sigs/descheduler/issues/811 like @harshanarayana mentioned. But it's worth unsticking this just to see

Running locally, the following commands work:

$ descheduler --skip-headers=true
$ descheduler --skip-headers

but this one gives the unknown command error:

$ descheduler --skip-headers true

since we don't add a = when we parse cmdOptions can you try just setting an empty value "" for this? ie

  set {
    name  = "cmdOptions.skip-headers"
    value = ""
  }

Oct 07 '22 13:10 damemi

That did not make the logging error any better, however it did make the logs easier to read... I then tried the logtostderr method with:

resource "helm_release" "descheduler" {
  chart      = local.service_name
  name       = "${local.service_name}-${var.environment}"
  namespace  = local.namespace
  repository = "https://kubernetes-sigs.github.io/descheduler/"
  version    = "0.24.1" # APP VERSION v0.24.1

  set {
    name  = "cmdOptions.logtostderr"
    value = "false"
  }
}

This still seemed to not do the trick...

Oct 07 '22 18:10 gallowaystorm

Any updates on this?

Oct 17 '22 16:10 gallowaystorm

Hi @gallowaystorm, thanks for bumping this. This issue is in our usage of logging libraries from upstream Kubernetes and their default output streams.

I dug around some more and found this issue which also had to do with DataDog integration: https://github.com/kubernetes-sigs/descheduler/issues/676

It sounds like https://github.com/kubernetes-sigs/descheduler/issues/676#issuecomment-1013178391 describes a datadog workaround to redirect logs to the right log level based on parsing the lines. This might un stick you until we fix our own implementation.

Oct 17 '22 18:10 damemi

@gallowaystorm have you tried a combination of logtostderr=false and logging-format=json?

Oct 17 '22 20:10 damemi

Yes I did try that @damemi. Let me look at #676 and see whats happening there

Oct 18 '22 17:10 gallowaystorm

That workaround in #676 did the trick for our needs. However, because of the workaround depending on the headers being on the logs, we can now no longer remove the headers. Will this be fixed in any future releases?

Oct 19 '22 18:10 gallowaystorm

@gallowaystorm I'm glad that worked for you. Yes, there appears to be some issue in our logging as others have also reported issues with logs so we will be discussing this at our next project meeting

Oct 19 '22 18:10 damemi

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 17 '23 19:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Feb 16 '23 20:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Mar 18 '23 20:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 18 '23 20:03 k8s-ci-robot

descheduler descheduler copied to clipboard

Allow Logging format to be changed

descheduler
descheduler copied to clipboard