nginx-service-mesh
nginx-service-mesh copied to clipboard
grpc traffic allowed when accessControlMode=deny
I deployed NSM with accessControlMode=deny, and gRPC still works perfectly.
ACTUAL RESULTS
Transactions work with gRPC, when they should fail.
EXPECTED
I expected that traffic would be forbidden. This is the case with HTTP, but not with gRPC:
<html>
<head><title>403 Forbidden</title></head>
<body>
<center><h1>403 Forbidden</h1></center>
<hr><center>nginx/1.21.6</center>
</body>
</html>
STEPS (General)
- Deploy NSM
- Deploy server application
- Deploy client application
- Redeploy NSM with
accessControlMode=deny
and verifykubectl delete --namespace "nginx-mesh" \ $(kubectl get pods \ --namespace "nginx-mesh" \ --selector "app.kubernetes.io/name=nginx-mesh-api" \ --output name) # verify deny nginx-meshctl config | jq -r .accessControlMode
- Attempt gRPC traffic member within service mesh
STEPS (Specific)
These are steps from within my project, but really anything similar should work. I used grpcurl
to build this:
URLS=(https://docs.nginx.com/nginx-service-mesh/examples/{prometheus,grafana,otel-collector,jaeger}.yaml)
for URL in ${URLS[*]}; do curl -sOL $URL; done
for FILE in {prometheus,grafana,otel-collector,jaeger}.yaml; do kubectl apply -f $FILE; done
cat << EOF > nsm.yaml
# nsm.yaml
prometheusAddress: prometheus.nsm-monitoring.svc:9090
telemetry:
exporters:
otlp:
host: otel-collector.nsm-monitoring.svc
port: 4317
samplerRatio: 1
tracing: null
mtls:
mode: permissive
autoInjection:
disable: false
disabledNamespaces:
- nsm-monitoring
EOF
cat << EOF > dgraph.yaml
# dgraph.yaml
image:
tag: v21.03.2
alpha:
configFile:
config.yaml: |
security:
whitelist: "0.0.0.0/0"
EOF
cat << EOF > pydgraph.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: pydgraph-client
spec:
replicas: 1
selector:
matchLabels:
app: pydgraph-client
template:
metadata:
labels:
app: pydgraph-client
spec:
containers:
- name: pydgraph-client
image: darknerd/pydgraph-client:latest
env:
- name: DGRAPH_ALPHA_SERVER
value: dgraph-dgraph-alpha.dgraph.svc.cluster.local
resources:
requests:
memory: "64Mi"
cpu: "80m"
limits:
memory: "128Mi"
cpu: "250m"
EOF
###################
# deploy mesj
########################
helm repo add nginx https://helm.nginx.com/stable
helm install nginx-mesh nginx-stable/nginx-service-mesh --values nsm.yaml
###################
# deploy server to mesh
########################
kubectl get namespace "dgraph" > /dev/null 2> /dev/null \
|| kubectl create namespace "dgraph" \
&& kubectl label namespaces "dgraph" name="dgraph"
helm repo add dgraph https://charts.dgraph.io
helm template dgraph dgraph/dgraph --values dgraph.yaml \
| nginx-meshctl inject \
--ignore-incoming-ports 5080,7080 \
--ignore-outgoing-ports 5080,7080 \
| kubectl apply --namespace "dgraph" --filename
###################
# deploy client to mesh
########################
kubectl get namespace "pydgraph-client" > /dev/null 2> /dev/null \
|| kubectl create namespace "pydgraph-client" \
&& kubectl label namespaces "pydgraph-client" name="pydgraph-client"
cat pydgraph-client \
| nginx-meshctl inject \
| kubectl apply --namespace "pydgraph-client" --filename -
###################
# set accessControlMode=deny
########################
cat << EOF > nsm_deny .yaml
# nsm.yaml
prometheusAddress: prometheus.nsm-monitoring.svc:9090
telemetry:
exporters:
otlp:
host: otel-collector.nsm-monitoring.svc
port: 4317
samplerRatio: 1
tracing: null
mtls:
mode: permissive
autoInjection:
disable: false
disabledNamespaces:
- nsm-monitoring
accessControlMode: deny
EOF
# patch configuration
helm install nginx-mesh nginx-stable/nginx-service-mesh --values nsm.yaml
# delete pod using old configuration
kubectl delete --namespace "nginx-mesh" \
$(kubectl get pods --namespace "nginx-mesh" --selector "app.kubernetes.io/name=nginx-mesh-api" --output name)
# few seconds later, verify configuration is now deny
nginx-meshctl config | jq -r .accessControlMode
###################
# exec into client
########################
export CLIENT_NAMESPACE="pydgraph-client"
PYDGRAPH_POD=$(kubectl get pods --namespace $CLIENT_NAMESPACE --output name)
kubectl exec -ti --container "pydgraph-client" --namespace $CLIENT_NAMESPACE ${PYDGRAPH_POD} -- bash
###################
# NOTE: this is run in the pydgraph-client container
########################
# verify HTTP doesn't work (these fail as expected)
curl --silent ${DGRAPH_ALPHA_SERVER}:8080/health
curl --silent ${DGRAPH_ALPHA_SERVER}:8080/state
# verify GRPC doesn't work (these will succeed, not expected)
grpcurl -plaintext -proto api.proto ${DGRAPH_ALPHA_SERVER}:9080 api.Dgraph/CheckVersion
# GRPC using python works as well, should not work:
python3 load_data.py --plaintext --alpha ${DGRAPH_ALPHA_SERVER}:9080 \
--files ./sw.nquads.rdf --schema ./sw.schema
Our gRPC support does not have feature parity with HTTP. The TrafficTarget objects do not effect gRPC traffic. This deserves an explanation in the documentation.
We chose this path, in part, to seemlessly adopt a specific gRPC SMI traffic spec as it was developed. See here. In lieu of SMI support, we will more clearly document the lack of gRPC support.
Note that according to the docs linked:
gRPC - there should be a gRPC-specific traffic spec. As part of the first version, this has been left out as HTTPRouteGroup can be used in the interim.
Thus the HTTPRouteGroup
could be used in the interim, but unfortunately with NSM's implementation, this is not supported. Given that adding an integrated ingress controller with kubernetes-ingress (aka NGINX+ and NGINX Ingress Controllers) requires that all services that need an ingress have to be a part of the mesh, this increases the attack surface unnecessarily.
Setting the NGINX SM to deny for the accessControlMode
config and using HTTPRouteGroup
to restrict traffic to sensitive services on the mesh, would be one way to mitigate such risk.
However, given this feature does not actually work, at least with gRPC, the solution becomes difficult to recommend professionally in its current implementation. This especially affects backend databases like Dgraph that communicate in gRPC, but I imagine other microservices as well that more commonly communicate in gRPC for dramatic performance improvements. gRPC is extremely popular with microservices and distributed databases or other stateful services.
I would like to make this a feature request for gRPC, because without it, as using ingress controller forces NGINX to be requires putting everything on the mesh, so being able to lock this down would be nice. If the SMI will take to long to come out with the full spec (like 1+ year), then there needs to be something in the interim, e.g. GRPCRoute
, e.g. https://github.com/kubernetes-sigs/gateway-api/blob/main/site-src/geps/gep-1016.md, otherwise wait it is available officially if it will come out in the next few months.
Commenting to avoid auto bots closing issue.
SMI looks to be on hold until Gateway and SMI merge efforts. In the meantime, having some CRD (even if priority) would be ideal until SMI/GatewaySIG come up with solution. Other SM like Linkerd that intend to use the standard are doing this as far as I can tell, and will deprecate their proprietary CRDs once standard comes up with official CRD.