Loki helm chart version upgrade from 5.44.4 to 6.5.0 issues in Single Binary deployment mode.
Loki helm chart version upgrade from 5.44.4 to 6.5.0 issues in Single Binary deployment mode.
We are using azure Kubernetes service consisting of 1 system node in a system node pool and 3 user nodes in user node pool for deploying Loki .
Kubernetes Version : 1.29.2
- Affinity was working fine and all the pods were landing up in user node pool in 5.44.4,after upgrade to 6.5.0 when we set affinity we are encountering below error.
Error
coalesce.go:286: warning: cannot overwrite table with non table for loki.singleBinary.affinity (map[podAntiAffinity:map[requiredDuringSchedulingIgnoredDuringExecution:[map[labelSelector:map[matchLabels:map[app.kubernetes.io/component:single-binary]] topologyKey:kubernetes.io/hostname]]]]) May 7th 2024 10:59:51Error coalesce.go:286: warning: cannot overwrite table with non table for loki.singleBinary.affinity (map[podAntiAffinity:map[requiredDuringSchedulingIgnoredDuringExecution:[map[labelSelector:map[matchLabels:map[app.kubernetes.io/component:single-binary]] topologyKey:kubernetes.io/hostname]]]]) May 7th 2024 10:59:51Error Error: UPGRADE FAILED: execution error at (loki/templates/validate.yaml:31:4): You have more than zero replicas configured for both the single binary and simple scalable targets. If this was intentional change the deploymentMode to the transitional 'SingleBinary<->SimpleScalable' mode May 7th 2024 10:59:51Error Helm Upgrade returned non-zero exit code: 1. Deployment terminated. May 7th 2024 10:59:51Fatal The remote script failed with exit code 1
ubuntu@NARU-Pr5530:~$ kubectl describe pod loki-chunks-cache-0 -n loki|tail -5 Type Reason Age From Message
Warning FailedScheduling 2m43s default-scheduler 0/4 nodes are available: 1 Insufficient memory, 4 Insufficient cpu. preemption: 0/4 nodes are available: 4 No preemption victims found for incoming pod. Warning FailedScheduling 2m42s default-scheduler 0/4 nodes are available: 1 Insufficient memory, 4 Insufficient cpu. preemption: 0/4 nodes are available: 4 No preemption victims found for incoming pod. Normal NotTriggerScaleUp 2m40s cluster-autoscaler pod didn't trigger scale-up: 1 max node group size reached
If we are not using affinity in version 6.5.0 ,few pods are landing up in the system node and ending with the resources issues and failing , and as well we don't pods to land up in system node. Is there any way to fix this ?
Values.yaml ( used in 5.44.4)
--- https://github.com/grafana/loki/blob/main/production/helm/loki/values.yaml
loki: auth_enabled: false query_scheduler: max_outstanding_requests_per_tenant: 2048 query_range: parallelise_shardable_queries: false split_queries_by_interval: 0 commonConfig: replication_factor: 1 storage: type: filesystem
singleBinary: replicas: 1 persistence: size: 50Gi enableStatefulSetAutoDeletePVC: true affinity: | nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - preference: matchExpressions: - key: kubernetes.azure.com/mode operator: In values: - user weight: 50
Values.yaml ( used in 6.5.0)
--- https://github.com/grafana/loki/blob/main/production/helm/loki/values.yaml
deploymentMode: SingleBinary loki: auth_enabled: false query_scheduler: max_outstanding_requests_per_tenant: 2048 query_range: parallelise_shardable_queries: false limits_config: split_queries_by_interval: 0 commonConfig: replication_factor: 1 storage: type: filesystem schemaConfig: configs: - from: 2024-04-01 object_store: filesystem store: tsdb schema: v13 index: prefix: loki_index_ period: 24h ingester: chunk_encoding: snappy tracing: enabled: true querier: max_concurrent: 1
backend: replicas: 0 read: replicas: 0 write: replicas: 0
singleBinary: replicas: 1 persistence: size: 50Gi enableStatefulSetAutoDeletePVC: true enabled: true extraArgs: - -config.expand-env=true
chunksCache: allocatedMemory: 1024 writebackSizeLimit: 10MB
- After updrading to 6.5.0 the loki-0 pod going for crash loopback with below error.
Error
ubuntu@NARU-Pr5530:~$ kubectl logs loki-0 -n loki failed parsing config: /etc/loki/config/config.yaml: yaml: unmarshal errors: line 2: field Error not found in type loki.ConfigWrapper ubuntu@NARU-Pr5530:~$
ubuntu@NARU-Pr5530:~$ kubectl get pods -n loki -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
loki-0 0/1 CrashLoopBackOff 1 (11s ago) 74s 10.101.80.28 aks-npu2-21504394-vmss00000f
Kindly do the needful.
Thanks Naresh