image-builder icon indicating copy to clipboard operation
image-builder copied to clipboard

[capi] Auditd produces too much logs

Open sbueringer opened this issue 4 years ago • 14 comments

What steps did you take and what happened:

I built the latest CAPI Ubuntu image (qemu in this case, but this should affect all) and was using it for testing in ClusterAPI provider OpenStack. I saw that auditd produces a lot of logs. For example a simple cluster creation with a control plane node and 5 worker nodes produces about 200 MB of logs on the control plane node alone. I wasn't even running any tests on the cluster.

An example can be seen here (I didn't catch it unfortunately before wasting a lot of space in the test buckets)

What did you expect to happen:

I was expecting to have some audit logs but maybe not that much :)

Anything else you would like to add:

I'm mostly opening this issue to ask if this is intended. I filter out the audit logs in the CAPO e2e tests now and in my production use cases I don't use the upstream auditd config.

If there is a consensus we might configure auditd to produce a bit less logs.

Environment:

Project (Image Builder for Cluster API:

Additional info for Image Builder for Cluster API related issues:

  • OS (e.g. from /etc/os-release, or cmd /c ver):
NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.2 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
  • Cluster-api version (if using): latest ClusterAPI Provider OpenStack but this shouldn't matter
  • Kubernetes version: (use kubectl version): Kubernetes v1.20.4

/kind bug

sbueringer avatar Mar 19 '21 08:03 sbueringer

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Jun 17 '21 09:06 fejta-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten

fejta-bot avatar Jul 17 '21 10:07 fejta-bot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Aug 16 '21 10:08 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Aug 16 '21 10:08 k8s-ci-robot

/reopen /lifecycle frozen

randomvariable avatar Sep 14 '21 10:09 randomvariable

@randomvariable: Reopened this issue.

In response to this:

/reopen /lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Sep 14 '21 10:09 k8s-ci-robot

We have the same issue in CAPA

randomvariable avatar Sep 14 '21 10:09 randomvariable

Looking into this I think there are two options:

  • Make the containerd audit rules opt-in for CIS compliance
  • Default to masking the systemd-journald-audit.socket so audit logs only go to /var/log/audit/audit.*

randomvariable avatar Sep 14 '21 13:09 randomvariable

2nd option would be the same as on other operating systems?

Might be could to try to keep it consistent.

sbueringer avatar Sep 14 '21 14:09 sbueringer

2nd option would be the same as on other operating systems?

I checked, and in fact, Fedora / RH do still log to journald, the spamming is caused by the extra rules we added for containerd due to the CIS benchmark.

randomvariable avatar Sep 14 '21 17:09 randomvariable

And we have the spam on all OS or are these rules Ubuntu-specific?

sbueringer avatar Sep 14 '21 18:09 sbueringer

Spam is on everything except Flatcar by the looks of the Ansible.

randomvariable avatar Sep 15 '21 09:09 randomvariable

Found this GH issue after observing the same behaviour on our machines. Joining a node into a cluster with a handful of containers getting deployed to it causes more than 1 million lines of entries in syslog during startup, most of which is produced by auditd. I can't be the only one thinking that this is a little extreme for a default configuration.

itspngu avatar May 03 '22 10:05 itspngu

I forgot to follow up on this and just got reminded when browsing the open issues to see if another problem of mine had already been addressed.

Since we build the CAPI images in our own CI, I implemented a workaround for the audit spam by using the custom_role_names Packer config key that gets passed to the Ansible provisioner to include a role which simply disables that part. I can't copy our exact working setup verbatim because we're basically extending the build container by dropping in some extra files in the right places and a shell script to glue everything together, but the gist of it is something along the lines of the following:

{
  // This is a JSON file that you pass to packer by adding its path to the PACKER_VAR_FILES env var
  "custom_role": "true",
  "custom_role_names": "auditd-please-shut-up-aaaaaaah"
}
# This is an ansible role that you'd want to put in the correct path (in this example, ansible/roles/auditd-please-shut-up-aaaaaaah/tasks/main.yml)
- name: Disable extended audit rules (https://github.com/kubernetes-sigs/image-builder/issues/556)
  file:
    # https://github.com/kubernetes-sigs/image-builder/blob/f4b84b0c42cf32d3a6bff164a412ea3adfd41915/images/capi/ansible/roles/node/tasks/main.yml#L81
    path: /etc/audit/rules.d/containerd.rules
    state: absent
export PACKER_VAR_FILES="$PACKER_VAR_FILES packer/auditd-please-shut-up-aaaaaaah-override.json"
make build-<provider>-<image>

Hope this helps.

itspngu avatar Jul 30 '22 15:07 itspngu