cluster-logging-operator LOG 2207: Add policies based log flow control in CLO

Description

"Flow control" refers to how the logging system behaves when logs are produced faster than they can be collected or forwarded. This PR enhances the API to let cluster administrators limit logging rates, or ignore some logs entirely. Logs may still be lost if the collector cannot keep up, but administrators have more control over what is lost, and more predictability of log rates.

Control log rates and overflow policy at two points in the log forwarder:

Output: controlling the flow rate per destination to selected outputs.
- Limit the rate of outbound logs to match output network and storage capacity.
- Controls aggregated (per-destination) output rate.
Input: Controlling log flow rates per container from selected containers.
- Limit the rate of log collection for selected groups of containers per-container.
- Controls individual (per-container) collection throttling.

Note:

Limit is applied as number of records, not bytes
This enhancement does not include a block policy, which would back-pressure containers that exceed rate limits, forcing them to block on stout/std err and slow down to keep within the rate limit.

Example: Set a per-container limit for containers with certain labels

  inputs:
	- application:
          selector:
		matchLabels: { importance: low }
          limitPerContainer:
                policy: drop
                maxRecordsPerSecond: 10
  - application:
    selector:
	  	matchLabels: { importance: high }
    limitPerContainer:
                policy: drop
		maxRecordsPerSecond: 1000

/cc @jcantrill @vimalk78 @eranra /assign @alanconway

Links

Depending on PR(s): NA
Bugzilla: NA
Github issue: NA
JIRA: https://issues.redhat.com/browse/LOG-2207
Enhancement proposal: https://issues.redhat.com/browse/LOG-1043

Oct 07 '22 11:10 Pranjal-Gupta2

cc @alanconway @vimalk78

Oct 13 '22 19:10 jcantrill

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcantrill, Pranjal-Gupta2

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jcantrill]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Oct 13 '22 19:10 openshift-ci[bot]

/hold

Nov 02 '22 13:11 jcantrill

@Pranjal-Gupta2: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/functional	95fe2b20bdfeaa3c455a8aa1f918ff2707d2ca20	link	true	`/test functional`
ci/prow/e2e-ocp-next	95fe2b20bdfeaa3c455a8aa1f918ff2707d2ca20	link	false	`/test e2e-ocp-next`
ci/prow/e2e	95fe2b20bdfeaa3c455a8aa1f918ff2707d2ca20	link	true	`/test e2e`
ci/prow/e2e-claim-aws	95fe2b20bdfeaa3c455a8aa1f918ff2707d2ca20	link	false	`/test e2e-claim-aws`
ci/prow/unit	95fe2b20bdfeaa3c455a8aa1f918ff2707d2ca20	link	true	`/test unit`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Nov 09 '22 00:11 openshift-ci[bot]

/test functional

Jan 09 '23 09:01 Pranjal-Gupta2

/retest

May 03 '23 18:05 jcantrill

/retest

May 30 '23 19:05 jcantrill

/hold cancel

May 30 '23 19:05 jcantrill

/hold

May 30 '23 19:05 jcantrill

@Pranjal-Gupta2 make sure to work with @syedriko if needed to ensure the vector changes are included with the image we are using for the 5.8 release

May 30 '23 19:05 jcantrill

/test e2e-target

Jun 01 '23 13:06 jcantrill

/test functional

Jun 01 '23 14:06 jcantrill

/test e2e-target

Jun 01 '23 17:06 jcantrill

/test functional

Jun 01 '23 19:06 jcantrill

/test e2e

Jun 05 '23 16:06 jcantrill

@jcantrill: The specified target(s) for /test were not found. The following commands are available to trigger required jobs:

/test ci-index-cluster-logging-operator-bundle
/test e2e-target
/test functional
/test images
/test lint
/test unit

The following commands are available to trigger optional jobs:

/test e2e-ocp-target-minus-one
/test e2e-ocp-target-minus-two
/test functional-target

Use /test all to run all jobs.

In response to this:

/test e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jun 05 '23 16:06 openshift-ci[bot]

/retest

merged https://github.com/openshift/cluster-logging-operator/pull/2055 to hopfully resolve one flake

Jun 14 '23 17:06 jcantrill

@Pranjal-Gupta2 you may need to rebase on https://github.com/openshift/cluster-logging-operator/pull/2055 if you have not to take advantage of the changes. I'm not certain they fix the issue but in the PR tests based without duplicating the issue at hand

Jun 14 '23 21:06 jcantrill

/retest

Jun 19 '23 17:06 alanconway

@Pranjal-Gupta2: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-ocp-next	95fe2b20bdfeaa3c455a8aa1f918ff2707d2ca20	link	false	`/test e2e-ocp-next`
ci/prow/e2e	95fe2b20bdfeaa3c455a8aa1f918ff2707d2ca20	link	true	`/test e2e`
ci/prow/e2e-claim-aws	95fe2b20bdfeaa3c455a8aa1f918ff2707d2ca20	link	false	`/test e2e-claim-aws`
ci/prow/ci-bundle-cluster-logging-operator-bundle	381d32668d2a0948d6edff356c79d239b98072dd	link	true	`/test ci-bundle-cluster-logging-operator-bundle`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Jul 11 '23 20:07 openshift-ci[bot]

/retest

Jul 11 '23 20:07 Pranjal-Gupta2

/hold /lgtm

Jul 12 '23 17:07 jcantrill

/hold cancel

Jul 12 '23 17:07 jcantrill

cluster-logging-operator cluster-logging-operator copied to clipboard

LOG 2207: Add policies based log flow control in CLO

Description

Links

cluster-logging-operator
cluster-logging-operator copied to clipboard