bosh-agent icon indicating copy to clipboard operation
bosh-agent copied to clipboard

Skip NATS firewall creation when cgroupv2 enabled

Open ramonskie opened this issue 1 year ago • 12 comments

this will add the following nftables rules when cgroupv2 are enabled on a linux system

socket cgroupv2 level 2 "system.slice/bosh-agent.service" ip daddr 1.2.3.4 tcp dport 1234 log prefix "Matched cgroup bosh-agent nats rule: " accept
meta skuid 0 ip daddr 1.2.3.4tcp dport 1234 log prefix "Matched skuid director nats rule: " accept
ip daddr 1.2.3.4 tcp dport 1234 log prefix "dropped nats rule: " drop

ramonskie avatar Jun 17 '24 10:06 ramonskie

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: ramonskie / name: Ramon Makkelie (95239be32c268ddecadb3b48adbbda2903806f95, e6eb8721ee9b55c941254f99285e72f3ccb0ccdf, 6270d9370141fb63fab426aedf48c4e1e80a7d3b, adfe22b44539ffc0f7e3350e76628c32655f2d53, 65156af956276a24032f9bf9b3241f75d4bdec95)
  • :white_check_mark: login: jpalermo / name: Joseph Palermo (6553c7a964d7455bdceb0d41f7cbce8d9a0684a4, 6448a75310d5f049f8ec91a388d0142b20e1c5af)

currenly the test will not work as we don't test it on a stemcell. as this test assumes it has a bosh-agent and systemd available so the test should be improved. any suggestions?

ramonskie avatar Jun 24 '24 11:06 ramonskie

For testing, if we are relying on the CFF Concourse instances, we may need to wait for a cgroups v2 stemcell to become available and for Concourse to be able to consume it for the cgroup part. Unsure about systemd and whether that's present in a Concourse container.

aramprice avatar Jun 24 '24 23:06 aramprice

We talked in the FIWG meeting about having the tests conditional on the cgroups currently enabled. So for now, in the bosh-agent pipelines, the cgroupsv2 code path wouldn't actually get tested. But as soon as we start shipping Noble and add a job for the pipeline, the cgroupsv2 tests would be active when running against Noble.

jpalermo avatar Jun 29 '24 23:06 jpalermo

@ramonskie did you have a chance to take a look at ^

rkoster avatar Jul 11 '24 14:07 rkoster

Once CI is green again @ramonskie is gonna look at this (as discussed during working group meeting).

rkoster avatar Sep 05 '24 14:09 rkoster

Hey @ramonskie and @rkoster, some of the new tests are still failing because they're trying to call iptables during the SetupNatsFirewall(test.mbus) call in the tests.

Since the unit tests are running on jammy, they're not using the nftables branch.

We'd probably need to either delete that test, or inject enough dependencies in the unit tests to prevent them from actually doing anything.

But larger question, can we just delete all this code?

The firewall rules are designed to prevent workloads from talking to NATs because the NATs creds used to be accessible from the IaaS metadata endpoints. But we've switched to using temporary NATs creds, so that's no longer an attack vector.

This does provide an additional layer of security, but it's a LOT of complexity to maintain for a "second fence".

We should discuss at the FIWG meeting tomorrow.

jpalermo avatar Sep 18 '24 18:09 jpalermo

The decision was made to not implement this logic in Nobel because it doesn't bring more value from security perspective.

beyhan avatar Sep 19 '24 14:09 beyhan

Updated the PR to skip firewall creation when using cgroups v2. The commits adding nftables support are still there in case we want them as a reference in the future.

jpalermo avatar Sep 19 '24 21:09 jpalermo

@beyhan, I added you as a reviewer because you said you were going to check if you all had any security concerns with this.

jpalermo avatar Sep 19 '24 22:09 jpalermo

@a-hassanin , just assigning you to check if you have any security concerns about this.

jpalermo avatar Sep 26 '24 14:09 jpalermo

The following patch should resolve the linter error:

diff --git a/platform/net/firewall_provider_linux.go b/platform/net/firewall_provider_linux.go
index 306d8938..9298d665 100644
--- a/platform/net/firewall_provider_linux.go
+++ b/platform/net/firewall_provider_linux.go
@@ -6,17 +6,15 @@ package net
 import (
 	"errors"
 	"fmt"
-	bosherr "github.com/cloudfoundry/bosh-utils/errors"
 	"net"
 	gonetURL "net/url"
 	"os"
 	"strings"
-	// NOTE: "cgroups is only intended to be used/compiled on linux based system"
-	// see: https://github.com/containerd/cgroups/issues/19
-	"github.com/containerd/cgroups"
-	"github.com/opencontainers/runtime-spec/specs-go"
 
+	bosherr "github.com/cloudfoundry/bosh-utils/errors"
+	"github.com/containerd/cgroups" // NOTE: linux only; see: https://github.com/containerd/cgroups/issues/19
 	"github.com/coreos/go-iptables/iptables"
+	"github.com/opencontainers/runtime-spec/specs-go"
 )
 
 const (

aramprice avatar Sep 26 '24 16:09 aramprice

@beyhan any news on the security front?

rkoster avatar Oct 03 '24 14:10 rkoster

I'm waiting for feedback here.

beyhan avatar Oct 11 '24 07:10 beyhan

Mentioning the feature PR https://github.com/cloudfoundry/bosh/pull/2417 as it is relevant for the decision here.

a-hassanin avatar Oct 11 '24 09:10 a-hassanin

It looks Ok from our side we can merge it. We will need to revisit this from SAP side when we want to use Noble bec. as I saw we need a PoC to make sure it will not pose a security risk on AliCloud. But this we will plan later.

a-hassanin avatar Oct 11 '24 13:10 a-hassanin