sumologic-kubernetes-collection icon indicating copy to clipboard operation
sumologic-kubernetes-collection copied to clipboard

K8s node draining blocked when persistentStorage with EBS enabled

Open szpuni opened this issue 4 years ago • 2 comments

Describe the bug When using Sumologic kubernetes collection with PersistentVolume (EBS) you are unable to rotate instances. Problem is that when we start draining node using kubectl drain command, drained node gets NoSchedule Taint and first pod is starting to be evicted.
Since persistent volume type EBS is enabled you are unable to start it on new node due to storage constraint. Pod will be constantly in Pending mode due to the fact it's trying to connect to storage volume which is still connected to old node.

Solution to this problem would be an EFS storage (or any NFS type storage) but that doesn't work either.
In helm chart values.yaml I saw you can specify StorageClass and use it instead of default SC. I have changed value of storage class and pointed it to my EFS storage class with EFS CNI.
Unfortunately this doesn't work and PVC is constantly in Pending mode even tho I have correct Storage Class and Persistent Volume available. Maybe name plays role in this I'm not sure.

Logs Node draining issue:

kubectl drain ip-10-100-8-252.eu-west-1.compute.internal --ignore-daemonsets --delete-local-data
node/ip-10-100-8-252.eu-west-1.compute.internal already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/aws-node-7l6bf, kube-system/calico-node-grjxz, kube-system/efs-csi-node-64l49, kube-system/kube-proxy-mj5zl, monitoring/sumologic-fluent-bit-6qxz9
evicting pod monitoring/sumologic-sumologic-fluentd-logs-0
evicting pod monitoring/sumologic-sumologic-fluentd-logs-1
evicting pod monitoring/sumologic-sumologic-fluentd-logs-2
error when evicting pod "sumologic-sumologic-fluentd-logs-1" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
error when evicting pod "sumologic-sumologic-fluentd-logs-2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

POD 0 in Pending state constantly:

NAME                                 READY   STATUS    RESTARTS   AGE
sumologic-fluent-bit-6qxz9           1/1     Running   0          14m
sumologic-fluent-bit-82xtn           1/1     Running   0          4m12s
sumologic-fluent-bit-cbhs2           1/1     Running   0          14m
sumologic-fluent-bit-pwtj6           1/1     Running   0          14m
sumologic-sumologic-fluentd-logs-0   0/1     Pending   0          88s
sumologic-sumologic-fluentd-logs-1   1/1     Running   0          14m
sumologic-sumologic-fluentd-logs-2   1/1     Running   0          14m

POD description:

Events:
  Type     Reason             Age                  From                Message
  ----     ------             ----                 ----                -------
  Normal   NotTriggerScaleUp  2m15s                cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 node(s) didn't match node selector, 1 max node group size reached
  Warning  FailedScheduling   84s (x3 over 2m26s)  default-scheduler   0/4 nodes are available: 1 node(s) had volume node affinity conflict, 1 node(s) were unschedulable, 2 node(s) didn't match node selector.

Persistent Volume logs when EFS is being used:

Persistent Volume Claim

NAME                                        STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS        AGE
buffer-sumologic-sumologic-fluentd-logs-0   Pending                                      efs-sc-prometheus   19s
buffer-sumologic-sumologic-fluentd-logs-1   Pending                                      efs-sc-prometheus   19s
buffer-sumologic-sumologic-fluentd-logs-2   Pending                                      efs-sc-prometheus   19s

Persistent Volume

NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS        REASON   AGE
pv-efs-prometheus   50Gi       RWX            Retain           Available           efs-sc-prometheus            6d17h

Storage Class

NAME                PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
efs-sc-prometheus   efs.csi.aws.com         Delete          Immediate              true                   6d17h
gp2 (default)       kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer   false                  84d

Description of Persistent Volume Class:

Name:          buffer-sumologic-sumologic-fluentd-logs-0
Namespace:     monitoring
StorageClass:  efs-sc-prometheus
Status:        Pending
Volume:
Labels:        app=sumologic-sumologic-fluentd-logs
Annotations:   volume.beta.kubernetes.io/storage-provisioner: efs.csi.aws.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Mounted By:    sumologic-sumologic-fluentd-logs-0
Events:
  Type    Reason                Age                From                         Message
  ----    ------                ----               ----                         -------
  Normal  ExternalProvisioning  11s (x4 over 45s)  persistentvolume-controller  waiting for a volume to be created, either by external provisioner "efs.csi.aws.com" or manually created by system administrator

Command used to install/upgrade Collection Direct command which you used to install/upgrade collection.

Configuration

fluentd:
  persistence:
    enabled: true
fluentd:
  persistence:
    enabled: true
    storageClass: "efs-sc-prometheus"

To Reproduce Install Sumologic chart with Persistent Volume type EBS with multiple Fluentd pods. Proceed with node draining using kubectl drain NODENAME. Second problem, install CNI driver for AWS EFS, Create Storage class and Persistent Volume using this class:

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  labels:
    app: prometheus-operator
provisioner: efs.csi.aws.com
reclaimPolicy: Delete
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-efs-prometheus
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 50Gi
  nfs:
    path: /
    server: fs-123456789.efs.eu-west-1.amazonaws.com
  persistentVolumeReclaimPolicy: Retain
  storageClassName: efs-sc-prometheus
  volumeMode: Filesystem

Expected behavior Draining node should succeed and persistent volume either recreated on new node or new EBS volume attached to node. If above is not possible add option which allows usage of Network File Systems like EFS. This doesn't seem to work either and PVC is not being created. In my case PV is working correctly cause i have used exact same PV on EFS for Prometheus storage which runs on StatefulSet.

Environment (please complete the following information):

  • Collection version (e.g. helm ls -n sumologic): sumologic-1.3.5
  • Kubernetes version (e.g. kubectl version): v1.18.9-eks-d1db3c
  • Cloud provider: AWS
  • Others: EBS and EFS

szpuni avatar Jan 14 '21 08:01 szpuni

Hi @szpuni , thank you for your report!

The Persistent Volumes on EKS are bound to the AZ. So recreating the Pod with attached volume in another AZ will result in the error you show - 1 node(s) had volume node affinity conflict.

To overcome this you can delete the bound PVC by hand and then reboot the Fluentd Pod - new PVC and Pod should be created in the desired AZ.

I've just tested this scenario and we've added info into our troubleshooting section: https://github.com/SumoLogic/sumologic-kubernetes-collection/blob/release-v2.0/deploy/docs/Troubleshoot_Collection.md#fluentd-pod-stuck-in-pending-state-after-recreation

I understand this is not ideal and we will work on automating that in the future.

Please let me know if that works for you for the time being.

perk-sumo avatar Apr 02 '21 14:04 perk-sumo

I understand this is not ideal and we will work on automating that in the future.

@perk-sumo has this been automated?

schlagerVIZIO avatar Mar 22 '23 23:03 schlagerVIZIO