neonKUBE
neonKUBE copied to clipboard
Large cluster deployment with 8 GiB RAM fails
This appears to be a cluster advice issue. In this case, the tempo-ingester pods are not able to be scheduled:
Name: tempo-ingester-0
Namespace: neon-monitor
Priority: 900000000
Priority Class Name: neon-min
Node: <none>
Labels: app.kubernetes.io/component=ingester
app.kubernetes.io/instance=tempo
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=tempo
app.kubernetes.io/version=1.3.2
controller-revision-hash=tempo-ingester-848cd8689
helm.sh/chart=tempo-distributed-0.16.9
statefulset.kubernetes.io/pod-name=tempo-ingester-0
tempo-gossip-member=true
Annotations: checksum/config: c764e248482a115a73aaa4678cf3e9a5b9ead286adccfc738cfc4a2e3f314e1c
sidecar.istio.io/inject: false
traffic.sidecar.istio.io/excludeInboundPorts: 7946
traffic.sidecar.istio.io/excludeOutboundPorts: 7946
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/tempo-ingester
Containers:
ingester:
Image: registry.neon.local/neonkube/grafana-tempo:2.0.0
Ports: 9095/TCP, 7946/TCP, 3100/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
-target=ingester
-config.file=/conf/tempo.yaml
-mem-ballast-size-mbs=64
-config.expand-env=true
Limits:
memory: 1Gi
Requests:
memory: 1Gi
Readiness: http-get http://:http/ready delay=30s timeout=1s period=10s #success=1 #failure=3
Environment:
ACCESS_KEY_ID: <set to the key 'accesskey' in secret 'minio'> Optional: false
SECRET_ACCESS_KEY: <set to the key 'secretkey' in secret 'minio'> Optional: false
GOGC: 10
Mounts:
/conf from tempo-conf (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xknb7 (ro)
/var/tempo from data (rw)
Conditions:
Type Status
PodScheduled False
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-tempo-ingester-0
ReadOnly: false
tempo-conf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: tempo
Optional: false
kube-api-access-xknb7:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: node.neonkube.io/monitor.traces-internal=true
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 30s
node.kubernetes.io/unreachable:NoExecute op=Exists for 30s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 12m default-scheduler 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 3 Insufficient memory, 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 5 node(s) didn't find available persistent volumes to bind. preemption: 0/6 nodes are available: 1 Insufficient memory, 5 Preemption is not helpful for scheduling.
Warning FailedScheduling 11m (x4 over 12m) default-scheduler 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 3 Insufficient memory, 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 5 node(s) had volume node affinity conflict. preemption: 0/6 nodes are available: 1 Insufficient memory, 5 Preemption is not helpful for scheduling.
Warning FailedScheduling 4m (x5 over 11m) default-scheduler 0/6 nodes are available: 3 Insufficient memory, 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 5 node(s) had volume node affinity conflict. preemption: 0/6 nodes are available: 1 Insufficient memory, 5 Preemption is not helpful for scheduling.```
I've temporarily reset the node RAM for these test clusters: 8 GiB --> 16 GiB
@marcusbooyah looked at this and it's a problem with cluster advice. He hacked around this for clusters with 10 nodes or less but we'll need to put some more effort into how cluster advice works.
For now we should just recommend 16GB minimum