[BUG] - AWS Deploy failing with `Failed to identify fetch peer certificates`
Describe the bug
As seen by @ronald50928
He's facing issues deploying Nebari inside a private network. The instance he is deploying from is inside the VPC (connecting via a VPN).
Expected behavior
Deployment completing with no errors
OS and architecture in which you are running Nebari
Linux
How to Reproduce the problem?
Ran the following command:
nebari deploy -c nebari-config.yml
with following configuration:
provider: aws
namespace: dev
nebari_version: 2024.3.2
project_name: <SANITIZED>
domain: <SANITIZED>
helm_extensions: []
monitoring:
enabled: true
argo_workflows:
enabled: true
nebari_workflow_controller:
enabled: true
ci_cd:
type: none
terraform_state:
type: remote
ingress:
terraform_overrides:
load-balancer-annotations:
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
security:
keycloak:
initial_root_password: <SANITIZED>
authentication:
type: password
amazon_web_services:
kubernetes_version: '1.29'
region: us-east-1
permissions_boundary: arn:aws:iam::<ACCOUNT-ID>:policy/<Permissions-Boundary-POLICY-NAME>
existing_subnet_ids: ["subnet-<SUBNET-ID-1>", "subnet-SUBNET-ID-2"]
existing_security_group_id: sg-<SECURITY-GROUP-1>
node_groups:
general:
instance: m5.2xlarge
min_nodes: 1
max_nodes: 1
user:
instance: m5.xlarge
min_nodes: 1
max_nodes: 100
worker:
instance: m5.xlarge
min_nodes: 0
max_nodes: 450
jhub_apps:
enabled: true
Command output
terraform]: â·
[terraform]: â Error: Failed to identify fetch peer certificates
[terraform]: â
[terraform]: â with module.kubernetes.data.tls_certificate.this,
[terraform]: â on modules/kubernetes/main.tf line 82, in data "tls_certificate" "this":
[terraform]: â 82: data "tls_certificate" "this" {
[terraform]: â
[terraform]: â failed to fetch certificates from URL 'https': Get
[terraform]: â "[https://oidc.eks.us-east-1.amazonaws.com:443/id/A381A8C89FAEE2FC03AF83E334B12AEE](https://oidc.eks.us-east-1.amazonaws.com/id/A381A8C89FAEE2FC03AF83E334B12AEE)":
[terraform]: â dial tcp: lookup oidc.eks.us-east-1.amazonaws.com on 172.17.0.2:53: no such
[terraform]: â host
[terraform]: âĩ
Versions and dependencies used.
Nebari version: 2024.3.2
Kubernetes version: 1.29
Compute environment
AWS
Integrations
No response
Anything else?
No response
I am not sure if it's related, but I once needed to update the default security group created by Nebari to work with the internal VPN that was already in place; on AWS, there was a certain button to include it.
I would also try adding the extra certificates field, and try to include it manually:
### Certificate configuration ###
certificate:
type: existing
secret_name: <secret-name>
I would also try adding the extra certificates field, and try to include it manually:
This isn't related to ssl certs for the exposed load balancer, as that's not event deployed. This is related to connecting to the created k8s cluster.
I ran into this error when trying to deploy Nebari to a private/public VPC set up. In my case, I needed to include VPC endpoints for EKS, as found in this Nebari development branch, but I also had to change the EKS endpoint to private_dns_enable=false in my development branch in order for the OIDC provider to deploy properly and prevent this error. Specifically, the lines here https://github.com/mwengren/nebari/blob/ae769d21648944ea883fe0086e9f14d8bedcb7ef/src/_nebari/stages/infrastructure/template/aws/modules/network/main.tf#L247-L256.
I can't say I understand entirely why this resolved it my case, but this post and the AWS docs linked from there were what motivated me to change the private DNS setting for the endpoint. After that, all worked well. Sharing in case it helps/is still relevant.