cluster-logging-operator icon indicating copy to clipboard operation
cluster-logging-operator copied to clipboard

LOG 2207: Add policies based log flow control in CLO

Open Pranjal-Gupta2 opened this issue 2 years ago • 4 comments

Description

"Flow control" refers to how the logging system behaves when logs are produced faster than they can be collected or forwarded. This PR enhances the API to let cluster administrators limit logging rates, or ignore some logs entirely. Logs may still be lost if the collector cannot keep up, but administrators have more control over what is lost, and more predictability of log rates.

Control log rates and overflow policy at two points in the log forwarder:

  • Output: controlling the flow rate per destination to selected outputs.
    • Limit the rate of outbound logs to match output network and storage capacity.
    • Controls aggregated (per-destination) output rate.
  • Input: Controlling log flow rates per container from selected containers.
    • Limit the rate of log collection for selected groups of containers per-container.
    • Controls individual (per-container) collection throttling.

Note:

  • Limit is applied as number of records, not bytes
  • This enhancement does not include a block policy, which would back-pressure containers that exceed rate limits, forcing them to block on stout/std err and slow down to keep within the rate limit.

Example: Set a per-container limit for containers with certain labels

  inputs:
	- application:
          selector:
		matchLabels: { importance: low }
          limitPerContainer:
                policy: drop
                maxRecordsPerSecond: 10
  - application:
    selector:
	  	matchLabels: { importance: high }
    limitPerContainer:
                policy: drop
		maxRecordsPerSecond: 1000

/cc @jcantrill @vimalk78 @eranra /assign @alanconway

Links

  • Depending on PR(s): NA
  • Bugzilla: NA
  • Github issue: NA
  • JIRA: https://issues.redhat.com/browse/LOG-2207
  • Enhancement proposal: https://issues.redhat.com/browse/LOG-1043

Pranjal-Gupta2 avatar Oct 07 '22 11:10 Pranjal-Gupta2

cc @alanconway @vimalk78

jcantrill avatar Oct 13 '22 19:10 jcantrill

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcantrill, Pranjal-Gupta2

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Oct 13 '22 19:10 openshift-ci[bot]

/hold

jcantrill avatar Nov 02 '22 13:11 jcantrill

@Pranjal-Gupta2: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/functional 95fe2b20bdfeaa3c455a8aa1f918ff2707d2ca20 link true /test functional
ci/prow/e2e-ocp-next 95fe2b20bdfeaa3c455a8aa1f918ff2707d2ca20 link false /test e2e-ocp-next
ci/prow/e2e 95fe2b20bdfeaa3c455a8aa1f918ff2707d2ca20 link true /test e2e
ci/prow/e2e-claim-aws 95fe2b20bdfeaa3c455a8aa1f918ff2707d2ca20 link false /test e2e-claim-aws
ci/prow/unit 95fe2b20bdfeaa3c455a8aa1f918ff2707d2ca20 link true /test unit

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Nov 09 '22 00:11 openshift-ci[bot]

/test functional

Pranjal-Gupta2 avatar Jan 09 '23 09:01 Pranjal-Gupta2

/retest

jcantrill avatar May 03 '23 18:05 jcantrill

/retest

jcantrill avatar May 30 '23 19:05 jcantrill

/hold cancel

jcantrill avatar May 30 '23 19:05 jcantrill

/hold

jcantrill avatar May 30 '23 19:05 jcantrill

@Pranjal-Gupta2 make sure to work with @syedriko if needed to ensure the vector changes are included with the image we are using for the 5.8 release

jcantrill avatar May 30 '23 19:05 jcantrill

/test e2e-target

jcantrill avatar Jun 01 '23 13:06 jcantrill

/test functional

jcantrill avatar Jun 01 '23 14:06 jcantrill

/test e2e-target

jcantrill avatar Jun 01 '23 17:06 jcantrill

/test functional

jcantrill avatar Jun 01 '23 19:06 jcantrill

/test e2e

jcantrill avatar Jun 05 '23 16:06 jcantrill

@jcantrill: The specified target(s) for /test were not found. The following commands are available to trigger required jobs:

  • /test ci-index-cluster-logging-operator-bundle
  • /test e2e-target
  • /test functional
  • /test images
  • /test lint
  • /test unit

The following commands are available to trigger optional jobs:

  • /test e2e-ocp-target-minus-one
  • /test e2e-ocp-target-minus-two
  • /test functional-target

Use /test all to run all jobs.

In response to this:

/test e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar Jun 05 '23 16:06 openshift-ci[bot]

/retest

merged https://github.com/openshift/cluster-logging-operator/pull/2055 to hopfully resolve one flake

jcantrill avatar Jun 14 '23 17:06 jcantrill

@Pranjal-Gupta2 you may need to rebase on https://github.com/openshift/cluster-logging-operator/pull/2055 if you have not to take advantage of the changes. I'm not certain they fix the issue but in the PR tests based without duplicating the issue at hand

jcantrill avatar Jun 14 '23 21:06 jcantrill

/retest

alanconway avatar Jun 19 '23 17:06 alanconway

@Pranjal-Gupta2: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-ocp-next 95fe2b20bdfeaa3c455a8aa1f918ff2707d2ca20 link false /test e2e-ocp-next
ci/prow/e2e 95fe2b20bdfeaa3c455a8aa1f918ff2707d2ca20 link true /test e2e
ci/prow/e2e-claim-aws 95fe2b20bdfeaa3c455a8aa1f918ff2707d2ca20 link false /test e2e-claim-aws
ci/prow/ci-bundle-cluster-logging-operator-bundle 381d32668d2a0948d6edff356c79d239b98072dd link true /test ci-bundle-cluster-logging-operator-bundle

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Jul 11 '23 20:07 openshift-ci[bot]

/retest

Pranjal-Gupta2 avatar Jul 11 '23 20:07 Pranjal-Gupta2

/hold /lgtm

jcantrill avatar Jul 12 '23 17:07 jcantrill

/hold cancel

jcantrill avatar Jul 12 '23 17:07 jcantrill