image-builder [capi] Auditd produces too much logs

What steps did you take and what happened:

I built the latest CAPI Ubuntu image (qemu in this case, but this should affect all) and was using it for testing in ClusterAPI provider OpenStack. I saw that auditd produces a lot of logs. For example a simple cluster creation with a control plane node and 5 worker nodes produces about 200 MB of logs on the control plane node alone. I wasn't even running any tests on the cluster.

An example can be seen here (I didn't catch it unfortunately before wasting a lot of space in the test buckets)

What did you expect to happen:

I was expecting to have some audit logs but maybe not that much :)

Anything else you would like to add:

I'm mostly opening this issue to ask if this is intended. I filter out the audit logs in the CAPO e2e tests now and in my production use cases I don't use the upstream auditd config.

If there is a consensus we might configure auditd to produce a bit less logs.

Environment:

Project (Image Builder for Cluster API:

Additional info for Image Builder for Cluster API related issues:

OS (e.g. from /etc/os-release, or cmd /c ver):

NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.2 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Cluster-api version (if using): latest ClusterAPI Provider OpenStack but this shouldn't matter
Kubernetes version: (use kubectl version): Kubernetes v1.20.4

/kind bug

Mar 19 '21 08:03 sbueringer

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

Jun 17 '21 09:06 fejta-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten

Jul 17 '21 10:07 fejta-bot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Aug 16 '21 10:08 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Aug 16 '21 10:08 k8s-ci-robot

/reopen /lifecycle frozen

Sep 14 '21 10:09 randomvariable

@randomvariable: Reopened this issue.

In response to this:

/reopen /lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sep 14 '21 10:09 k8s-ci-robot

We have the same issue in CAPA

Sep 14 '21 10:09 randomvariable

Looking into this I think there are two options:

Make the containerd audit rules opt-in for CIS compliance
Default to masking the systemd-journald-audit.socket so audit logs only go to /var/log/audit/audit.*

Sep 14 '21 13:09 randomvariable

2nd option would be the same as on other operating systems?

Might be could to try to keep it consistent.

Sep 14 '21 14:09 sbueringer

2nd option would be the same as on other operating systems?

I checked, and in fact, Fedora / RH do still log to journald, the spamming is caused by the extra rules we added for containerd due to the CIS benchmark.

Sep 14 '21 17:09 randomvariable

And we have the spam on all OS or are these rules Ubuntu-specific?

Sep 14 '21 18:09 sbueringer

Spam is on everything except Flatcar by the looks of the Ansible.

Sep 15 '21 09:09 randomvariable

Found this GH issue after observing the same behaviour on our machines. Joining a node into a cluster with a handful of containers getting deployed to it causes more than 1 million lines of entries in syslog during startup, most of which is produced by auditd. I can't be the only one thinking that this is a little extreme for a default configuration.

May 03 '22 10:05 itspngu

I forgot to follow up on this and just got reminded when browsing the open issues to see if another problem of mine had already been addressed.

Since we build the CAPI images in our own CI, I implemented a workaround for the audit spam by using the custom_role_names Packer config key that gets passed to the Ansible provisioner to include a role which simply disables that part. I can't copy our exact working setup verbatim because we're basically extending the build container by dropping in some extra files in the right places and a shell script to glue everything together, but the gist of it is something along the lines of the following:

{
  // This is a JSON file that you pass to packer by adding its path to the PACKER_VAR_FILES env var
  "custom_role": "true",
  "custom_role_names": "auditd-please-shut-up-aaaaaaah"
}

# This is an ansible role that you'd want to put in the correct path (in this example, ansible/roles/auditd-please-shut-up-aaaaaaah/tasks/main.yml)
- name: Disable extended audit rules (https://github.com/kubernetes-sigs/image-builder/issues/556)
  file:
    # https://github.com/kubernetes-sigs/image-builder/blob/f4b84b0c42cf32d3a6bff164a412ea3adfd41915/images/capi/ansible/roles/node/tasks/main.yml#L81
    path: /etc/audit/rules.d/containerd.rules
    state: absent

export PACKER_VAR_FILES="$PACKER_VAR_FILES packer/auditd-please-shut-up-aaaaaaah-override.json"
make build-<provider>-<image>

Hope this helps.

Jul 30 '22 15:07 itspngu