nebari icon indicating copy to clipboard operation
nebari copied to clipboard

[BUG] - AWS Deploy failing with `Failed to identify fetch peer certificates`

Open aktech opened this issue 1 year ago â€Ē 4 comments

Describe the bug

As seen by @ronald50928

He's facing issues deploying Nebari inside a private network. The instance he is deploying from is inside the VPC (connecting via a VPN).

Expected behavior

Deployment completing with no errors

OS and architecture in which you are running Nebari

Linux

How to Reproduce the problem?

Ran the following command:

nebari deploy -c nebari-config.yml

with following configuration:

provider: aws
namespace: dev
nebari_version: 2024.3.2
project_name: <SANITIZED>
domain: <SANITIZED>
helm_extensions: []
monitoring:
  enabled: true
argo_workflows:
  enabled: true
  nebari_workflow_controller:
    enabled: true
ci_cd:
  type: none
terraform_state:
  type: remote
ingress:
  terraform_overrides:
    load-balancer-annotations:
      service.beta.kubernetes.io/aws-load-balancer-internal: "true"
security:
  keycloak:
    initial_root_password: <SANITIZED>
  authentication:
    type: password

amazon_web_services:
  kubernetes_version: '1.29'
  region: us-east-1
  permissions_boundary: arn:aws:iam::<ACCOUNT-ID>:policy/<Permissions-Boundary-POLICY-NAME>
  existing_subnet_ids: ["subnet-<SUBNET-ID-1>", "subnet-SUBNET-ID-2"]
  existing_security_group_id: sg-<SECURITY-GROUP-1>
  node_groups:
    general:
      instance: m5.2xlarge
      min_nodes: 1
      max_nodes: 1
    user:
      instance: m5.xlarge
      min_nodes: 1
      max_nodes: 100
    worker:
      instance: m5.xlarge
      min_nodes: 0
      max_nodes: 450
jhub_apps:
  enabled: true

Command output

terraform]: ╷
[terraform]: │ Error: Failed to identify fetch peer certificates
[terraform]: │
[terraform]: │   with module.kubernetes.data.tls_certificate.this,
[terraform]: │   on modules/kubernetes/main.tf line 82, in data "tls_certificate" "this":
[terraform]: │   82: data "tls_certificate" "this" {
[terraform]: │
[terraform]: │ failed to fetch certificates from URL 'https': Get
[terraform]: │ "[https://oidc.eks.us-east-1.amazonaws.com:443/id/A381A8C89FAEE2FC03AF83E334B12AEE](https://oidc.eks.us-east-1.amazonaws.com/id/A381A8C89FAEE2FC03AF83E334B12AEE)":
[terraform]: │ dial tcp: lookup oidc.eks.us-east-1.amazonaws.com on 172.17.0.2:53: no such
[terraform]: │ host
[terraform]: â•ĩ

Versions and dependencies used.

Nebari version: 2024.3.2 Kubernetes version: 1.29

Compute environment

AWS

Integrations

No response

Anything else?

No response

aktech avatar Apr 12 '24 17:04 aktech

I am not sure if it's related, but I once needed to update the default security group created by Nebari to work with the internal VPN that was already in place; on AWS, there was a certain button to include it.

viniciusdc avatar Apr 15 '24 12:04 viniciusdc

I would also try adding the extra certificates field, and try to include it manually:

### Certificate configuration ###
certificate:
  type: existing
  secret_name: <secret-name>

viniciusdc avatar Apr 15 '24 12:04 viniciusdc

I would also try adding the extra certificates field, and try to include it manually:

This isn't related to ssl certs for the exposed load balancer, as that's not event deployed. This is related to connecting to the created k8s cluster.

aktech avatar Apr 16 '24 13:04 aktech

I ran into this error when trying to deploy Nebari to a private/public VPC set up. In my case, I needed to include VPC endpoints for EKS, as found in this Nebari development branch, but I also had to change the EKS endpoint to private_dns_enable=false in my development branch in order for the OIDC provider to deploy properly and prevent this error. Specifically, the lines here https://github.com/mwengren/nebari/blob/ae769d21648944ea883fe0086e9f14d8bedcb7ef/src/_nebari/stages/infrastructure/template/aws/modules/network/main.tf#L247-L256.

I can't say I understand entirely why this resolved it my case, but this post and the AWS docs linked from there were what motivated me to change the private DNS setting for the endpoint. After that, all worked well. Sharing in case it helps/is still relevant.

mwengren avatar Aug 18 '25 15:08 mwengren