calico Calico node in unprivileged mode tries to write /proc/sys for accept

When running calico-node unprivileged we see tons of the following messages in the logs (tens of millions per day)

calico-node-dxmj5 calico-node 2022-01-02 12:08:38.007 [INFO][34] felix/endpoint_mgr.go 1179: Applying /proc/sys configuration to interface. ifaceName="enid6a600c4d5b" calico-node-dxmj5 calico-node 2022-01-02 12:08:38.007 [WARNING][34] felix/endpoint_mgr.go 716: Failed to configure interface, will retry error= calico-node-dxmj5 calico-node 2022-01-02 12:08:38.110 [WARNING][34] felix/endpoint_mgr.go 1175: Could not set accept_ra: ifaceName="enid6a600c4d5b"

Expected Behavior

Calico node does not try to change the value of "/proc/sys/net/ipv6/conf/%s/accept_ra" when it is already zero.
If changing the setting fails a warning is logged once.

Current Behavior

Millions of warnings are logged per day even when the setting is already correct.

Possible Solution

Felix endpoint_mgr should:

only try to set accept_ra to 0 if it is not already 0 (which is try in our case) see https://github.com/projectcalico/calico/blob/master/felix/dataplane/linux/endpoint_mgr.go#L1169
only log a warning or error once if it fails. There is no use in retrying as it will always fails if the container doesn't have the privileges

Steps to Reproduce (for bugs)

Run calico-node unprivileged

Context

As a best practice we want to run containers with minimal privileges.

Your Environment

Calico version: docker.io/calico/node:v3.21.2
Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes
Operating System and version: linux
Link to your project (optional):

Jan 02 '22 12:01 schans

I see a related error regarding sysctl setting with IPv4 neighbor parameters in #5341 which would be fixed by some simple changes to error handling (also running in unpriv container). Looks like the logic around detecting unchangeable sysctls could use some changes. Recent Kubernetes already has done this (see feature gate KubeletInUserNamespace=true).

Jan 03 '22 12:01 Ukko-Ylijumala

Now that I tested setting that parameter in one of my worker nodes, it works:

echo 0 | sudo tee /proc/sys/net/ipv6/conf/cali*/accept_ra 0

I am running the container as a Proxmox unpriv container with "nested=1", which apparently does some changes around /proc and /sys handling. Might be worth a shot looking at how they are doing the nesting (for a quick workaround maybe).

Jan 03 '22 12:01 Ukko-Ylijumala

Thanks for your reply. Unfortunately we don't have that setting available to us in a docker/kubernetes setup.

Jan 03 '22 18:01 schans

If the settings in configureInterface() are not critical. We could change the logic somewhat to only attempt to write them if the process has write access. There seem to be some oddities between checking for ipv4 interfaces while writing ipv6 settings anyway from line 1171.

Looking at the logic at wlIfaceNamesToReconfigure on line 708 https://github.com/projectcalico/calico/blob/a89955005f08bc4415674bd692f787649e6d9136/felix/dataplane/linux/endpoint_mgr.go#L1160 it could potentially refactored somewhat into:

check if interface exists
check if a setting if the interface is writeable (here my assumption is that they are either all writeable or none)
if not, log debug message with "cannot configure interface %s, /proc/sys is read only"
if yes, call configureInterface

If this make sense, I'm happy try to provide a patch.

Jan 03 '22 20:01 schans

Might need some shaving, testing and love but this would be my general gist: https://github.com/projectcalico/calico/pull/5350

Jan 03 '22 21:01 schans

@schans Please can you explain your use case in more detail? calico-node needs certain privileges to do what it does, so running completely unprivileged isn't a useful option.

Jan 04 '22 11:01 nelljerram

CC @lmm ; i think he was working on something similar.

Jan 04 '22 15:01 fasaxc

@neiljerram Happy to elaborate. We would like to run calico-node as non-privileged if possible. We are using the tigera operator to deploy calico on EKS in AWS. We followed the documentation at https://projectcalico.docs.tigera.io/security/non-privileged and add the the "nonPrivileged: enabled" setting to the installation crd.

Everything seems to be working fine but we noticed a stream of errors/warnings in the logs as calico-node keeps trying to change some settings in /proc/sys for the network interfaces but doesn't have the permissions to do so. We managed to change the log level to error by setting "logSeverityScreen: Error" on the default felixconfiguration (maybe there should a nicer way through the operator).

So maybe the first thing to analyze/discuss is whether with the restrictions specified in the documentation it is supported and makes sense to run non-privileged at all. The caveat mentioned in the documentation is a bit vague to be honest:

"The tradeoff for more security is the overhead of Calico networking management. For example, you no longer receive Calico corrections to misconfigurations caused by other components within your cluster, along with limited support for new features"

If running calico with these restrictions is a valid option then it makes sense to address the issue I raised as it looks like the calico keeps trying to write to /proc/sys in a "loop" because it errors out.

For me it is difficult to judge how important for normal operations the changes are that calico tries to make to the (interface configs) in /proc/sys.

HTH!

Jan 04 '22 15:01 schans

Worth to mention we're looking for only NetworkPolicies feature and IPAM is still handled by aws-vpc-cni.

Jan 04 '22 16:01 michalschott

Many thanks @schans @michalschott . @lmm Does this align with your work?

Jan 05 '22 18:01 nelljerram

@neiljerram I wrote the original lines of code to disable accept_ra a while ago, but had a quick chat with @mgleung who worked on adding non-privileged support. Given #5341 we'll probably want something to handle setting sysctls more gracefully when running Calico non-privileged.

Thanks for the PR @schans , I'll take a look at that.

Jan 05 '22 22:01 lmm

Hi all and many thanks to @schans for providing a potential solution. I am facing the same issue (AWS EKS with AWS VNC-CNI and non-privileged calico-node). @fasaxc and @caseydavenport Are there any updates regarding the PR? Could you elaborate what is still missing?

Jul 07 '22 14:07 tgip-work

From the latest comments, looks like that PR:

Has some failing tests that need to be fixed up
Needs a rebase to fix some conflicts
Has some feedback from @fasaxc that needs to be addressed.

Jul 08 '22 16:07 caseydavenport

Ran into this issue as well using tigera-operator helm chart v3.24.1 It looks like that PR has been closed as 'working as expected', while we still see tons of these messages in the logs.

In my case I can confirm that all ifaces are already set to 0 under /proc/sys/net/ipv6/conf/eni-*/accept_ra. My setup is EKS v1.23 based, networkpolicy only (IPAMD on aws-vpc-cni). Other than for the superfluous logging, everything seems to be working OK.

Sep 27 '22 13:09 jortkoopmans

This issue is also one of the blocker for us (https://gardener.cloud/) from running calico-node in unprivileged mode by default.

Sep 27 '22 13:09 ialidzhikov

Sorry, "working as expected" is probably an oversimplification of the issue. I believe we needed to address the issue of concurrent interface deletion for a fix to be accepted. Any takers?

Sep 30 '22 04:09 mgleung

Hello, this issue is a blocker for us as were applying Pod Security Policies. Bump =)

May 01 '24 23:05 dantech2000

calico
calico copied to clipboard

Calico node in unprivileged mode tries to write /proc/sys for accept_ra

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

calico calico copied to clipboard

Calico node in unprivileged mode tries to write /proc/sys for accept_ra

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

calico
calico copied to clipboard