scylla-operator
scylla-operator copied to clipboard
Can't start scylla with default helm chart because very small volume size
Hello!
I faced the issue that when I follow the instructions described on the page https://operator.docs.scylladb.com/stable/helm.html I couldn't get the running scylla cluster. It looks like that default PV size is 10GB:
apiVersion: scylla.scylladb.com/v1
kind: ScyllaCluster
metadata:
annotations:
meta.helm.sh/release-name: scylla-scylla
meta.helm.sh/release-namespace: scylla
labels:
app.kubernetes.io/managed-by: Helm
helm.toolkit.fluxcd.io/name: scylla
helm.toolkit.fluxcd.io/namespace: flux-system
name: scylla-scylla
namespace: scylla
spec:
agentRepository: scylladb/scylla-manager-agent
agentVersion: 2.5.2
datacenter:
name: us-east-1
racks:
- agentResources:
requests:
cpu: 50m
memory: 10M
members: 3
name: us-east-1a
resources:
limits:
cpu: 1
memory: 4Gi
requests:
cpu: 1
memory: 4Gi
scyllaAgentConfig: scylla-agent-config
scyllaConfig: scylla-config
storage:
capacity: 10Gi
repository: scylladb/scylla
version: 4.5.1
if so the pod is failing with the next error message:
I1230 14:38:26.581342 1 operator/sidecar.go:158] sidecar version "v1.6.0-7-gac9d88f"
I1230 14:38:26.581437 1 flag/flags.go:59] FLAG: --burst="5"
I1230 14:38:26.581445 1 flag/flags.go:59] FLAG: --cpu-count="1"
I1230 14:38:26.581448 1 flag/flags.go:59] FLAG: --help="false"
I1230 14:38:26.581452 1 flag/flags.go:59] FLAG: --kubeconfig=""
I1230 14:38:26.581456 1 flag/flags.go:59] FLAG: --loglevel="2"
I1230 14:38:26.581461 1 flag/flags.go:59] FLAG: --namespace="scylla"
I1230 14:38:26.581464 1 flag/flags.go:59] FLAG: --qps="2"
I1230 14:38:26.581469 1 flag/flags.go:59] FLAG: --secret-name="scylla-scylla-auth-token"
I1230 14:38:26.581472 1 flag/flags.go:59] FLAG: --service-name="scylla-scylla-us-east-1-us-east-1a-0"
I1230 14:38:26.581475 1 flag/flags.go:59] FLAG: --v="2"
I1230 14:38:26.581847 1 operator/sidecar.go:218] "Waiting for single service informer caches to sync"
I1230 14:38:26.682470 1 operator/sidecar.go:235] "Waiting for Service" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
I1230 14:38:26.686835 1 operator/sidecar.go:269] "Waiting for Pod To have scylla ContainerID set" Pod="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:38:26.691850 1 cache/reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Pod: unknown (get pods)
E1230 14:38:28.203022 1 cache/reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Pod: unknown (get pods)
I1230 14:38:28.203142 1 operator/sidecar.go:323] "Waiting for NodeConfig's data ConfigMap " Selector="scylla-operator.scylladb.com/config-map-type=NodeConfigData,scylla-operator.scylladb.com/owner-uid=5488e48b-c678-4766-ad3b-37e2126c22a2"
I1230 14:38:28.208418 1 operator/sidecar.go:385] "Starting scylla"
I1230 14:38:28.208433 1 config/config.go:64] Setting up scylla.yaml
I1230 14:38:28.208578 1 config/config.go:96] "no scylla.yaml config map available"
I1230 14:38:28.211683 1 config/config.go:68] Setting up cassandra-rackdc.properties
I1230 14:38:28.211727 1 config/config.go:157] "unable to read properties" file="/mnt/scylla-config/cassandra-rackdc.properties"
I1230 14:38:28.211845 1 config/config.go:72] Setting up entrypoint script
I1230 14:38:28.227197 1 config/config.go:253] "Scylla version detected" version={version:{Major:4 Minor:5 Patch:1 Pre:[] Build:[]} unknown:false}
I1230 14:38:28.227270 1 config/config.go:282] "Scylla entrypoint" Command="/docker-entrypoint.py --developer-mode=0 --overprovisioned=1 --smp=1 --prometheus-address=0.0.0.0 --listen-address=0.0.0.0 --broadcast-address=10.245.89.175 --broadcast-rpc-address=10.245.89.175 --seeds=10.245.89.175"
I1230 14:38:28.227340 1 cache/shared_informer.go:240] Waiting for caches to sync for Prober
I1230 14:38:28.227358 1 cache/shared_informer.go:247] Caches are synced for Prober
I1230 14:38:28.227367 1 operator/sidecar.go:414] "Starting Prober server"
I1230 14:38:28.227599 1 sidecar/controller.go:170] "Starting controller" Controller="SidecarController"
I1230 14:38:28.227611 1 cache/shared_informer.go:240] Waiting for caches to sync for SidecarController
I1230 14:38:28.227619 1 cache/shared_informer.go:247] Caches are synced for SidecarController
running: (['/opt/scylladb/scripts/scylla_dev_mode_setup', '--developer-mode', '0'],)
running: (['/opt/scylladb/scripts/scylla_io_setup'],)
ERROR:root:Filesystem at /var/lib/scylla/data has only 9910345728 bytes available; that is less than the recommended 10 GB. Please free up space and run scylla_io_setup again.
failed!
Traceback (most recent call last):
File "/docker-entrypoint.py", line 27, in <module>
setup.io()
File "/scyllasetup.py", line 67, in io
self._run(['/opt/scylladb/scripts/scylla_io_setup'])
File "/scyllasetup.py", line 37, in _run
subprocess.check_call(*args, **kwargs)
File "/opt/scylladb/python3/lib64/python3.9/subprocess.py", line 373, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/scylladb/scripts/scylla_io_setup']' returned non-zero exit status 1.
E1230 14:38:31.835289 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:38:41.835672 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:38:51.836392 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:39:01.835292 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:39:11.835123 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:39:21.835211 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:39:31.835776 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:39:41.834980 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:39:51.835903 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:40:01.835599 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:40:11.834676 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:40:21.835544 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:40:31.834940 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:40:41.835909 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:40:51.835099 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:41:01.835945 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:41:11.835740 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:41:21.836038 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:41:31.835636 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
E1230 14:41:41.834754 1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="dial tcp 10.244.2.190:10001: connect: connection refused" Service="scylla/scylla-scylla-us-east-1-us-east-1a-0"
I think we need to make the defaults more reasonable and fix default capacity at least to 15GiB: https://github.com/scylladb/scylla-operator/blob/6e9424fa2c4206c1e3e6fd74b9398e5a36d91f26/helm/scylla/values.yaml#L58
yeah, I guess there is some filesystem overhead, and we should raise the default
The issue is still very much live
The issue is still very much live
The issue is still very much live
The patch was not merged yet, but you may be able to provide feedback - does it solve the issue for you?
The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 30d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out
/lifecycle stale
/remove-lifecycle stale