ClickHouse Keeper in RO mode due to incorrect permissions on snapshots directory
I deployed ClickHouse keeper using clickhouse-operator 0.24.0 with 3 nodes and a PVC.
Unfortunately ClickHouse Keeper is in Read-Only mode because it failed to write to the snapshot directory /var/lib/clickhouse-keeper/coordination/logs/ as they have incorrect permissions.
Below is error message:
2024.10.07 16:52:07.388939 [ 1 ] {} <Error> void DB::Changelog::readChangelogAndInitWriter(uint64_t, uint64_t): Code: 76. DB::ErrnoException: Cannot open file /var/lib/clickhouse-keeper/coordination/logs/changelog_1_100000.bin: , errno: 13, strerror: Permission denied. (CANNOT_OPEN_FILE), Stack trace (when copying this message, always include the lines below):
I can deploy a working ClickHouse Keeper when not using PVC using clickhouse-operator 0.23.7
# ---
# # Fake Service to drop-in replacement Zookeeper with CHK
# apiVersion: v1
# kind: Service
# metadata:
# # DNS would be like zookeeper.namespace.svc
# name: zookeeper
# labels:
# app: zookeeper
# spec:
# ports:
# - port: 2181
# name: client
# - port: 7000
# name: prometheus
# selector:
# app: clickhouse-keeper
# what: node
---
apiVersion: "clickhouse-keeper.altinity.com/v1"
kind: "ClickHouseKeeperInstallation"
metadata:
name: xxxxxxx
labels:
app: clickhouse-keeper
spec:
configuration:
clusters:
- name: "chk-3"
layout:
replicasCount: 3
settings:
logger/level: "trace"
logger/console: "true"
listen_host: "0.0.0.0"
keeper_server/storage_path: /var/lib/clickhouse-keeper
keeper_server/tcp_port: "2181"
keeper_server/four_letter_word_white_list: "*"
keeper_server/coordination_settings/raft_logs_level: "information"
keeper_server/raft_configuration/server/port: "9444"
prometheus/endpoint: "/metrics"
prometheus/port: "7000"
prometheus/metrics: "true"
prometheus/events: "true"
prometheus/asynchronous_metrics: "true"
prometheus/status_info: "false"
defaults:
templates:
# Templates are specified as default for all clusters
podTemplate: default
templates:
podTemplates:
- name: default
spec:
# affinity removed to allow use in single node test environment
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- clickhouse-keeper
topologyKey: "kubernetes.io/hostname"
containers:
- name: clickhouse-keeper
imagePullPolicy: IfNotPresent
image: "clickhouse/clickhouse-keeper:24-alpine"
resources:
requests:
memory: "256M"
cpu: "1"
limits:
memory: "4Gi"
cpu: "2"
# volumeClaimTemplates:
# - name: both-paths
# spec:
# storageClassName: gp3-retain
# accessModes:
# - ReadWriteOnce
# resources:
# requests:
# storage: 10Gi
It seems that by default /var/lib/clickhouse-keeper/coordination/{logs,snapshots} are ownded by root, but we need to ensure that everyone has write access.
Below are permissions when not using PVC
chk-edp-global-finance-1:/# ls -ltrh /var/lib/clickhouse-keeper/
total 8K
drwxr-xr-x 4 root root 35 Oct 9 11:32 coordination
-rw-r----- 1 clickhou clickhou 36 Oct 9 11:32 uuid
drwxr-x--- 2 clickhou clickhou 6 Oct 9 11:32 rocksdb
-rw-r----- 1 clickhou clickhou 23 Oct 9 11:32 state
drwxr-x--- 2 clickhou clickhou 31 Oct 9 11:32 preprocessed_configs
chk-edp-global-finance-1:/# ls -ltrh /var/lib/clickhouse-keeper/coordination/
total 0
drwxrwxrwx 2 root root 38 Oct 9 11:32 snapshots
drwxrwxrwx 2 root root 41 Oct 9 11:32 logs
However I do believe it will be better to have these directories owned by root:clickhouse with rwxrwx--- permissions (770)
Would it help if you add securityContext as described here? https://github.com/Altinity/clickhouse-operator/issues/1370
Note, that CHK is not compatible between 0.23.7 and 0.24.0 -- see migration guide: https://github.com/Altinity/clickhouse-operator/blob/0.24.0/docs/keeper_migration_from_23_to_24.md
Would it help if you add securityContext as described here? #1370
+1, this should be helpful
spec:
securityContext:
fsGroup: 101
fsGroupChangePolicy: OnRootMismatch
runAsGroup: 101
runAsUser: 101
@chengjoey , we are hesitant to ingest it in the code by default. But maybe it is a good thing to do
Imo it should really be added by default if that's the permissions etc the container requires to be run. I can't think of any reason that this would be disadvantageous?
We do not do it for CHI as well, so let's leave it out for now. I will open a separate issue to consider adding default security context.
https://github.com/Altinity/clickhouse-operator/issues/1672