scylla-operator
scylla-operator copied to clipboard
CPU and smp calculation are wrong
Describe the bug Spinning scylla cluster on GKE(n1-standard-8)[8 cpu cores] resulted in:
--smp=6 --cpuset=0-7
Resulted command line:
/usr/bin/scylla --log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix --io-properties-file=/etc/scylla.d/io_properties.yaml --cpuset 0-7 --smp 6 --listen-address 10.142.0.53 --rpc-address 10.142.0.53 --seed-provider-parameters seeds=10.3.243.243,10.3.246.86 --broadcast-address 10.3.246.86 --broadcast-rpc-address 10.3.246.86 --blocked-reactor-notify-ms 999999999
To Reproduce Steps to reproduce the behavior:
- Deploy Operator on GKE as it advised on ./examples/gke
- Deploy scylla
- Log into scylla node and see arguments scylla is running with
Expected behavior It is expected to be something like:
--smp=7 --cpuset=0-6
or
--smp=6 --cpuset=0-4,6-7
or
--cpuset=0-4,6-7
Logs
- https://cloudius-jenkins-test.s3.amazonaws.com/04398b18-cc08-4f3b-8a64-3330751c1f2e/20201130_192731/db-cluster-04398b18.zip
Environment:
- Platform: GKE
- Kubernetes version: 1.15.12-gke.20
- Scylla version: 4.2.0
- Scylla-operator version: nightly
I can confirm all issues pointed by @dkropachev (#282 #281 #280 #279). https://github.com/scylladb/scylla-operator/issues/280 and https://github.com/scylladb/scylla-operator/issues/279 are pretty basic stuff. 🤯
SMP is calculated based on user provided resources, where cpuset is assigned by kubelet based on Pod QoS class. For Burstable and BestEffort cpuset will be set to all available cores at the host, because pods are getting shared CPU access, and they can be executed on all of the CPUs. For Guaranteed QoS, Pod gets CPU exclusively and only then smp number will match the cpuset.
@tnozicka , @zimnx , could you please check if it is still relevant
The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 30d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out
/lifecycle stale