azure_arc icon indicating copy to clipboard operation
azure_arc copied to clipboard

Flux GitOps script for arc-enabled AWS EKS

Open jwaltireland opened this issue 2 years ago • 20 comments

There is currently no script to enable gitops for arc enabled aws eks. There is instruction on it for ASK, script for GKE, but nothing for eks.

will this be something that will be added to the jump-start documentation and/or git repo?

jwaltireland avatar Jul 20 '22 14:07 jwaltireland

Hey friend! Thanks for opening this issue. We appreciate your contribution and welcome you to our community! We are glad to have you here and to have your input on the Azure Arc Jumpstart.

github-actions[bot] avatar Jul 20 '22 14:07 github-actions[bot]

also, I attempted to manually create the gitops from the azure portal, using the gitops config from the arc enabled AKS, but I got the following errors:

Failed to create the GitOps configuration and/or Flux extension on 'Arc-EKS-Demo'. Error: The extension operation failed with the following error: error: Unable to get the status from the local CRD with the error : {Error : Retry for given duration didn't get any results with err {status not populated}}.. Please refer to this document: https://aka.ms/gitops-troubleshooting to troubleshoot the issue.

Failed to create the GitOps configuration and/or Flux extension on 'Arc-EKS-Demo'. Error: The extension operation failed with the following error: unable to add the configuration with configId {extension:flux} due to error: {error while adding the CRD configuration: error {Operation cannot be fulfilled on extensionconfigs.clusterconfig.azure.com "flux": the object has been modified; please apply your changes to the latest version and try again}}.. Please refer to this document: https://aka.ms/gitops-troubleshooting to troubleshoot the issue.

jwaltireland avatar Jul 20 '22 16:07 jwaltireland

@zaidmohd do you mind taking a look?

likamrat avatar Jul 20 '22 16:07 likamrat

@jwaltireland The GKE script should work, can you try and share the result? Before running the script make sure that the kubectl context is pointing to your Azure Arc-enabled Kubernetes cluster. https://github.com/microsoft/azure_arc/blob/main/azure_arc_k8s_jumpstart/gke/gitops/basic/az_k8sconfig_gke.sh

zaidmohd avatar Jul 21 '22 03:07 zaidmohd

interestingly, I have been trying to use the gke script. however, it's possible my Mac is setup incorrectly, as I'm getting errors on the helm update command on line 34:

Error: Kubernetes cluster unreachable: exec plugin: invalid apiVersion "client.authentication.k8s.io/v1alpha1"

searching the web told me to downgrade helm, which I've done, but the 3.9 version is still there.

sorry, new to all this

jwaltireland avatar Jul 21 '22 12:07 jwaltireland

this is what I've tried:

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 chmod 700 get_helm.sh ./get_helm.sh DESIRED_VERSION=v3.8.2 bash get_helm.sh

from this site: https://github.com/helm/helm/issues/10975

jwaltireland avatar Jul 21 '22 12:07 jwaltireland

update: so I decided to run the script in azure cloud shell; since I tried to manually create the gitops from the azure portal, I got some errors around the flux extension: that manual process has caused the flux ext to be in a pending state.

so, I ran the az command to delete the extension, and here's the error:

john@Azure:~$ az k8s-extension delete -g Arc-AKS-Demo -c Arc-EKS-Demo -n flux -t connectedClusters --yes The command requires the extension k8s-extension. Do you want to install it now? The command will continue to run after the extension is installed. (Y/n): y Run 'az config set extension.use_dynamic_install=yes_without_prompt' to allow installing extensions without prompt. (ExtensionOperationFailed) The extension operation failed with the following error: error: Unable to get the status from the local CRD with the error : {Error : Retry for given duration didn't get any results with err {status not populated}}. Code: ExtensionOperationFailed Message: The extension operation failed with the following error: error: Unable to get the status from the local CRD with the error : {Error : Retry for given duration didn't get any results with err {status not populated}}.

I then decided to delete the azure arc instance for eks from the portal so that I can start from scratch, and that becomes an error: only get this in the activity log: Failed to delete Kubernetes - Azure Arc cluster 'Arc-EKS-Demo'. Error: 'error'

jwaltireland avatar Jul 21 '22 14:07 jwaltireland

updated: I tried deleting the arc enabled cluster in the portal again, and it worked this time will update once I re-enable eks and test running the gitops script

jwaltireland avatar Jul 21 '22 14:07 jwaltireland

update: running the script fails:

Creating GitOps config for Hello-Arc app 'Microsoft.Flux' extension not found on the cluster, installing it now. This may take a few minutes... (ExtensionOperationFailed) The extension operation failed with the following error: error: Unable to get the status from the local CRD with the error : {Error : Retry for given duration didn't get any results with err {status not populated}}. Code: ExtensionOperationFailed Message: The extension operation failed with the following error: error: Unable to get the status from the local CRD with the error : {Error : Retry for given duration didn't get any results with err {status not populated}}. Creating GitOps config for Hello-Arc Ingress Error! 'Microsoft.Flux' extension is installed but not in a succeeded state on the cluster. Unable to proceed with Flux v2 configuration install. Try resolving the extension error on the cluster or removing and re-installing the extension.

jwaltireland avatar Jul 21 '22 16:07 jwaltireland

@jwaltireland any updates?

likamrat avatar Jul 26 '22 18:07 likamrat

no updates. can't move past the issues.

Have you all tried setting up gitops for eks in arc?

jwaltireland avatar Jul 26 '22 19:07 jwaltireland

@jwaltireland Can you please share the output for "kubectl get pods -n flux-system" to check the status of flux ?

zaidmohd avatar Jul 27 '22 19:07 zaidmohd

@jwaltireland I tested the script and it works. Can you ensure the EKS cluster is configured to auto-scale or configured with 3 nodes. I got a similar Flux extension deployment failure error as all Flux pods were in pending state due to node unavailability.

zaidmohd avatar Jul 28 '22 00:07 zaidmohd

ok, I won't be able to get to this today. hopefully in the next few days I'll confirm if it works.

jwaltireland avatar Jul 28 '22 12:07 jwaltireland

@jwaltireland I tested the script and it works. Can you ensure the EKS cluster is configured to auto-scale or configured with 3 nodes. I got a similar Flux extension deployment failure error as all Flux pods were in pending state due to node unavailability.

I want to update the jump-start eks terraform module to do this.

is it just modifying the scaling config block?

resource "aws_eks_node_group" "arcdemo" { cluster_name = var.cluster_name node_group_name = "arcdemo" node_role_arn = aws_iam_role.arcdemo-node.arn subnet_ids = var.cluster_subnet_ids instance_types = ["t2.medium"]

scaling_config { desired_size = 1 max_size = 1 min_size = 1 }

depends_on = [ aws_iam_role_policy_attachment.arcdemo-node-AmazonEKSWorkerNodePolicy, aws_iam_role_policy_attachment.arcdemo-node-AmazonEKS_CNI_Policy, aws_iam_role_policy_attachment.arcdemo-node-AmazonEC2ContainerRegistryReadOnly, ] }

jwaltireland avatar Jul 29 '22 14:07 jwaltireland

@jwaltireland Below config works for GitOps test.

scaling_config { desired_size = 3 max_size = 3 min_size = 1 }

zaidmohd avatar Aug 03 '22 01:08 zaidmohd

thanks @zaidmohd looks like increasing the nodes allows for the script to run successfully! I can see the gitops in azure arc now.

Thanks for all the help.

jwaltireland avatar Aug 03 '22 13:08 jwaltireland

Thx for the feedback @jwaltireland. We will update the EKS scenario with a dedicated PR and I will go close this issue for now.

likamrat avatar Aug 03 '22 14:08 likamrat

@jwaltireland would you like to open a PR to update the scenario?

likamrat avatar Aug 03 '22 16:08 likamrat

in reference to #1308?

jwaltireland avatar Aug 03 '22 16:08 jwaltireland

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 10 days.

github-actions[bot] avatar Sep 03 '22 00:09 github-actions[bot]

This issue was closed because it has been stalled for 30 days with no activity. If this issue is still relevant, please re-open a new issue.

github-actions[bot] avatar Sep 13 '22 00:09 github-actions[bot]