pega-helm-charts icon indicating copy to clipboard operation
pega-helm-charts copied to clipboard

Deployment struck in wait-for-pegasearch

Open pk-mittal opened this issue 2 years ago • 12 comments

Hello team,

our deployment struck in wait-for-pegasearch and it is running took long.

i am using private repo for the pegasearch image

image

pk-mittal avatar Mar 11 '22 11:03 pk-mittal

My initial guess is that the search image is failing to pull. Can you check the logs for the search container? I think its name is srs-service.

RyanStan avatar Mar 14 '22 21:03 RyanStan

Hello Ryan, Thank You I am unable to pull logs for search container. I have tried to install PEGA via minikube. but it is also in same state.

Here I got error related to PVC.

0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.

Thanks, Praveen

pk-mittal avatar Mar 15 '22 09:03 pk-mittal

Hi Praveen, Can you run a describe pod command on srs-service? The events in the output may have more detail on which specific PersistentVolumeClaim didn't bind. If you configured the tiers with a volumeClaimTemplate then you'll need to manually configure PersistentVolume resources or a StorageClass for dynamic provisioning.

Also, what action are you performing? Install-deploy? If so, was the install successful?

RyanStan avatar Mar 15 '22 19:03 RyanStan

Hello Ryan, i am not an expert in Azure. Here is the result of describe.

thanks praveen

pk-mittal avatar Mar 16 '22 10:03 pk-mittal

I am performing deploy action. and yes install was fine (pega-db-install job in completed status)

pk-mittal avatar Mar 16 '22 10:03 pk-mittal

Ok, thanks for the info. It looks like your attachment was for kubectl describe nodes. Can you run kubectl describe pods and resend the output here? If you did not deploy the pega chart to the default namespace, then you'll also have to specify the namespace with the -n option.

RyanStan avatar Mar 16 '22 14:03 RyanStan

Hello Ryan , Thank you for your help.

here are the command result

Regard's, Praveen

pk-mittal avatar Mar 16 '22 14:03 pk-mittal

Looking at the events under the container "search", I see the following event warnings.

Warning  FailedMount         6m5s (x4414 over 8d)     kubelet                  Unable to attach or mount volumes: unmounted volumes=[esstorage], unattached volumes=[default-token-fshxs esstorage]: timed out waiting for the condition
Warning  FailedAttachVolume  5m48s (x3004 over 4d5h)  pega  AttachVolume.Attach failed for volume "pvc-36b53c97-5d24-42e8-8300-7f0aab37d7eb" : Retriable: false, RetryAfter: 0s, HTTPStatusCode: 404, RawError: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 404, RawError: {"error":{"code":"ResourceNotFound","message":"The Resource 'Microsoft.Compute/disks/kubernetes-dynamic-pvc-36b53c97-5d24-42e8-8300-7f0aab37d7eb' under resource group 'mc_pega_showcase_rg_helaba-showcae-cluster_westeurope' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix"}}
Warning  FailedMount         93s (x1204 over 8d)      kubelet                  Unable to attach or mount volumes: unmounted volumes=[esstorage], unattached volumes=[esstorage default-token-fshxs]: timed out waiting for the condition

The volume corresponding to the esstorage-pega-search-0 claim is failing to attach to the pod because the controller is getting a 404 ResourceNotFound error from Azure about the volume disk. Here is where we reference a volumeClaimTemplate for esstorage. We're expecting that storage to exist, whether it's specified in a Kuberentes PersistentVolume resource or a StorageClass resource.

Whether you have to manually configure this storage and associated resources, or whether AKS should do it automatically for your cluster, I'm not sure. This Azure guide looks useful though: Storage options for applications in Azure Kubernetes Service (AKS).

Hopefully this helps! Seems like configuring a persistent volume or storageclass for pegasearch is something we'll want to note in our documentation.

RyanStan avatar Mar 16 '22 18:03 RyanStan

Hello Ryan, Thank you for your help. One quick question. if I run pega.yaml chart again with action=deploy. this will not delete rules created in PEGA application (no changes in database, I will not lose my development work). Correct? or my understanding is wrong? Thanks, Praveen

pk-mittal avatar Mar 17 '22 10:03 pk-mittal

Yep you're right! As long as you have your rules saved, they won't be lost when you deploy again.

RyanStan avatar Mar 17 '22 14:03 RyanStan

Hello Ryan,

I have run the pega.yaml again after applying the changes provided in the article.

https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

i have added below code in pega.yaml for web,batch and stream tier.

  resources:
    requests:
      memory: "16Gi"
      cpu: "6"
    limits:
      memory: "20Gi"
      cpu: "8"

I think, we need to put this these limit in template. can i do a pull request for these changes?

Thanks Praveen

pk-mittal avatar Mar 18 '22 08:03 pk-mittal

Hi Praveen, our templates for web, batch, and stream tiers already specify default resource limits of 12Gi for memory and 3 CPU units so I don't think we want to merge these changes into the values file. Thank you for bringing this up though

RyanStan avatar Mar 22 '22 12:03 RyanStan

Closing the issue with the above provided explanation.

MadhuriArugula avatar Nov 02 '22 08:11 MadhuriArugula