cloud-on-k8s
cloud-on-k8s copied to clipboard
Potentially chown Elastic Agent hostpath data directory
There have been a number of issues/PRs concerning this issue: #5993, #6147, #6205, #6193.
The following is required when running Elastic Agent with a hostPath
:
podTemplate:
spec:
containers:
- name: agent
securityContext:
runAsUser: 0
If not, you get this error:
Error: preparing STATE_PATH(/usr/share/elastic-agent/state) failed: mkdir /usr/share/elastic-agent/state/data: permission denied
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.5/fleet-troubleshooting.html
An initContainer
that does the following allows Elastic Agent to work properly without The Agent Itself using runAsUser: 0
initContainers:
- command:
- sh
- -c
- chown 1000:1000 /usr/share/elastic-agent/state
image: docker.elastic.co/beats/elastic-agent:8.5.0
imagePullPolicy: IfNotPresent
name: permissions
securityContext:
runAsUser: 0
This is more complicated in a situation such as openshift where UIDs are randomized, but likely doable.
So the question is, do we pursue this path to make the UX for Elastic Agent more consistent between empty emptyDir
, and hostPath
?
Security Note
- The initContainer still runs as
runAsUser: 0
. - At least it only runs for a couple of seconds, as opposed to running "forever" as uid 0, which seems to minimize the time frame where a security issue could stem from this.
After discussion, we've decided to take the approach of using an init container to make this user experience better. Since the gid in openshift is known, we'll take this approach:
initContainers:
- command:
- sh
- -c
- chmod g+w /usr/share/elastic-agent/state && chgrp 1000 /usr/share/elastic-agent/state
image: docker.elastic.co/beats/elastic-agent:8.5.0
imagePullPolicy: IfNotPresent
name: permissions
securityContext:
runAsUser: 0
Also related: https://github.com/elastic/cloud-on-k8s/issues/6280
Hi @naemono I have been stuck with this issue for a couple of days and can't get it working. We are using Openshift 4.12 & argoCD with the elastic operator in Openshift.
I followed the official eck k8s 2.6 documentation and created the required resources.
Worth mentioning is that we implemented the compliance operator and have used the CIS operator to hardening the platform.
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
name: fleet-server-dev
namespace: elastic-dev
spec:
version: 8.6.1
kibanaRef:
name: kibanadev
elasticsearchRefs:
- name: esdev01
mode: fleet
fleetServerEnabled: true
deployment:
replicas: 1
podTemplate:
spec:
serviceAccountName: elastic-agent
automountServiceAccountToken: true
securityContext:
runAsUser: 0
---
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
name: elastic-agent-dev
namespace: elastic-dev
spec:
version: 8.6.1
kibanaRef:
name: kibanadev
fleetServerRef:
name: fleet-server-dev
mode: fleet
daemonSet:
podTemplate:
spec:
serviceAccountName: elastic-agent
automountServiceAccountToken: true
securityContext:
runAsUser: 0
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: elastic-agent
rules:
- apiGroups: [""] # "" indicates the core API group
resources:
- pods
- nodes
- namespaces
verbs:
- get
- watch
- list
- apiGroups: ["coordination.k8s.io"]
resources:
- leases
verbs:
- get
- create
- update
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: elastic-agent
namespace: elastic-dev
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: elastic-agent
subjects:
- kind: ServiceAccount
name: elastic-agent
namespace: elastic-dev
roleRef:
kind: Role
name: elastic-agent
apiGroup: rbac.authorization.k8s.io
Rolebinding
Name: elastic-agent-rb
Labels: <none>
Annotations: <none>
Role:
Kind: ClusterRole
Name: system:openshift:scc:privileged
Subjects:
Kind Name Namespace
---- ---- ---------
ServiceAccount elastic-agent elastic-dev
The hostpath is created on the physical machine but we are still getting permissions denied!
Error: preparing STATE_PATH(/usr/share/elastic-agent/state) failed: mkdir /usr/share/elastic-agent/state/data: permission denied
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.6/fleet-troubleshooting.html
@gittihub123 I'll investigate this and get back to you.
@gittihub123 The below appears to be required in the case of openshift:
deployment:
replicas: 1
podTemplate:
spec:
containers:
- name: agent
securityContext:
privileged: true <==== This is the piece that's required in openshift
Hi @naemono This does not work on Openshift cluster because SElinux block it from creating files on the host filesystem.
The same applies when I try to create a standalone filebeat instance with this configuration.
# CRD to create beats with ECK (Pod(s))
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: panos-filebeat
namespace: elastic-dev
spec:
type: filebeat
version: 8.6.1
elasticsearchRef:
name: esdev
kibanaRef:
name: kibanadev
config:
filebeat.modules:
- module: panw
panos:
enabled: true
var.syslog_host: 0.0.0.0
daemonSet:
podTemplate:
spec:
serviceAccountName: filebeat
automountServiceAccountToken: true
securityContext:
privileged: true
Error message
one or more objects failed to apply, reason: admission webhook "elastic-beat-validation-v1beta1.k8s.elastic.co" denied the request: Beat.beat.k8s.elastic.co "panos-filebeat" is invalid: privileged: Invalid value: "privileged": privileged field found in the kubectl.kubernetes.io/last-applied-configuration annotation is unknown. This is often due to incorrect indentation in the manifest.
@gittihub123 Running Agent and/or Beat in an openshift environment has many more complexities than running in a standard Kubernetes environment. We document these issues here. We also have some beats recipes that we use in our e2e tests that we run on a regular basis here. I just successfully deployed this beat recipe on an openshift 4.9 cluster, following our documentation noted above, specifically:
oc adm policy add-scc-to-user privileged -z filebeat -n elastic
Then applied this manifest, which worked after a bit of time (beat pods crash once or twice while users/api keys are being propagated throughout the Elastic stack)
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: filebeat
spec:
type: filebeat
version: 8.6.0
elasticsearchRef:
name: testing
kibanaRef:
name: kibana
config:
filebeat.autodiscover.providers:
- node: ${NODE_NAME}
type: kubernetes
hints.default_config.enabled: "false"
templates:
- condition.equals.kubernetes.namespace: log-namespace
config:
- paths: ["/var/log/containers/*${data.kubernetes.container.id}.log"]
type: container
- condition.equals.kubernetes.labels.log-label: "true"
config:
- paths: ["/var/log/containers/*${data.kubernetes.container.id}.log"]
type: container
processors:
- add_cloud_metadata: {}
- add_host_metadata: {}
daemonSet:
podTemplate:
spec:
serviceAccountName: filebeat
automountServiceAccountToken: true
terminationGracePeriodSeconds: 30
# dnsPolicy: ClusterFirstWithHostNet
# hostNetwork: true # Allows to provide richer host metadata
containers:
- name: filebeat
securityContext:
runAsUser: 0
# If using Red Hat OpenShift uncomment this:
privileged: true
volumeMounts:
- name: varlogcontainers
mountPath: /var/log/containers
- name: varlogpods
mountPath: /var/log/pods
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumes:
- name: varlogcontainers
hostPath:
path: /var/log/containers
- name: varlogpods
hostPath:
path: /var/log/pods
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
resources:
- namespaces
- pods
- nodes
verbs:
- get
- watch
- list
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: filebeat
namespace: elastic
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: filebeat
subjects:
- kind: ServiceAccount
name: filebeat
namespace: elastic
roleRef:
kind: ClusterRole
name: filebeat
apiGroup: rbac.authorization.k8s.io
# ---
# My Elasticsearch cluster already existed....
# apiVersion: elasticsearch.k8s.elastic.co/v1
# kind: Elasticsearch
# metadata:
# name: elasticsearch
# spec:
# version: 8.6.1
# nodeSets:
# - name: default
# count: 3
# config:
# node.store.allow_mmap: false
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: kibana
spec:
version: 8.6.0
count: 1
elasticsearchRef:
name: testing
# ...
Note the difference in the daemonset.podTemplate.spec
and where the securityContext is applied:
daemonSet:
podTemplate:
spec:
serviceAccountName: filebeat
automountServiceAccountToken: true
terminationGracePeriodSeconds: 30
# dnsPolicy: ClusterFirstWithHostNet
# hostNetwork: true # Allows to provide richer host metadata
containers:
- name: filebeat
securityContext:
runAsUser: 0
# If using Red Hat OpenShift uncomment this:
privileged: true
Hi @naemono Thank you for the explaination. The filebeat work now but our goal is to implement elastic agent and activate different types of modules to collect syslog from outside of the cluster, from palo alto, cisco FTD, Cisco ASA etc.
So far, the elastic-agent is running and is managed by fleet but it's only collecting logs from Openshift (logs/metrics). The elastic stack is running in the same namespaces and I have connection between all pods (Elasticsearch, kibana, fleet & elastic-agent).
This is my configuration
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: {{ .Values.kibana.name }}
namespace: {{ .Values.namespace }}
spec:
http:
tls:
certificate:
secretName: {{ .Values.tls.certificate }}
config:
server.publicBaseUrl: "https://XXX.YYY.ZZZ/"
xpack.fleet.agents.elasticsearch.hosts: ["https://esdev-es-http.elastic-dev.svc:9200"]
xpack.fleet.agents.fleet_server.hosts: ["https://fleet-server-dev-agent-http.elastic-dev.svc:8220"]
xpack.fleet.packages:
- name: system
version: latest
- name: elastic_agent
version: latest
- name: fleet_server
version: latest
xpack.fleet.agentPolicies:
- name: Fleet Server test
id: eck-fleet-server
is_default_fleet_server: true
namespace: agent
monitoring_enabled:
- logs
- metrics
package_policies:
- name: fleet_server-1
id: fleet_server-1
package:
name: fleet_server
- name: Elastic Agent on ECK policy
id: eck-agent
namespace: agent
monitoring_enabled:
- logs
- metrics
unenroll_timeout: 900
is_default: true
package_policies:
- name: system-1
id: system-1
package:
name: system
- name: CiscoFTD
id: CiscoFTD
package:
name: Cisco FTD
- name: palo-alto
id: palo-alto
package:
name: panos
version: {{ .Values.version }}
count: {{ .Values.kibana.nodes }}
elasticsearchRef:
name: {{ .Values.name }}
podTemplate:
spec:
containers:
- name: kibana
resources:
limits:
memory: {{ .Values.kibana.resources.limits.memory }}
cpu: {{ .Values.kibana.resources.limits.cpu }}
I believe the network flow would be something like this right?
Syslog source (ciscoFTD, palo alto etc) -> Openshift route (for example ciscoftd.dev.test.com) -> elastic agent SVC (created by me to expose the elastic agents) -> elastic-agent pods.
This should be possible or should we try to do it another way?
Thanks.
Syslog source (ciscoFTD, palo alto etc) -> Openshift route (for example ciscoftd.dev.test.com) -> elastic agent SVC (created by me to expose the elastic agents) -> elastic-agent pods.
This solution makes sense to me using a custom tcp agent integration...
the solution will not work if you use a keystore !
Because operator append an initContainer before the permissions container ....