kpt-config-sync `Error: failed to download oci://localhost:5001/test-chart` with Kind and local registry

I'm trying to sync an Helm chart with Config Sync installed in Kind with a local registry, but I'm getting this error on my RootSync:

KNV2004: unable to sync repo
          Error in the helm-sync container: {"Msg":"unexpected error rendering chart, will retry","Err":"failed to render the helm chart: exit status 1, stdout: Error: failed to download \"oci://localhost:5001/test-chart\" at version \"1.0.0\"\n","Args":{}}

Here is my setup to reproduce:

## Set up Kind with local registry
reg_name='kind-registry'
reg_port='5001'
reg_internal_port='5000'
docker run \
    -d --restart=always -p "127.0.0.1:${reg_port}:${reg_internal_port}" --name "${reg_name}" \
    registry:2
cat <<EOF | ./kind create cluster --image kindest/node:v1.24.6 --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
containerdConfigPatches:
- |-
  [plugins."io.containerd.grpc.v1.cri".registry.mirrors."localhost:${reg_port}"]
    endpoint = ["http://${reg_name}:${reg_internal_port}"]
EOF
docker network connect "kind" "${reg_name}"
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: local-registry-hosting
  namespace: kube-public
data:
  localRegistryHosting.v1: |
    host: "localhost:${reg_port}"
    hostFromContainerRuntime: "${reg_name}:${reg_internal_port}"
    hostFromClusterNetwork: "${reg_name}:${reg_internal_port}"
    help: "https://kind.sigs.k8s.io/docs/user/local-registry/"
EOF

## Install CS
kubectl apply -f https://github.com/GoogleContainerTools/kpt-config-sync/releases/download/v1.13.0/config-sync-manifest.yaml

## Create local Helm chart
helm create test-chart
helm package test-chart --version 1.0.0
helm push test-chart-1.0.0.tgz oci://localhost:${reg_port}

## Confirming that I can successfully pull the Helm chart from the local registry
helm pull oci://localhost:${reg_port}/test-chart --version 1.0.0

## Sync local Helm chart
cat << EOF | kubectl apply -f -
apiVersion: configsync.gke.io/v1beta1
kind: RootSync
metadata:
  name: root-sync
  namespace: config-management-system
spec:
  sourceFormat: unstructured
  sourceType: helm
  helm:
    repo: oci://localhost:${reg_port}
    chart: test-chart
    version: 1.0.0
    releaseName: test-chart
    auth: none
EOF

JFYI, if I don't use the local registry setup, but instead use a public Helm chart in GHCR, it's working successfully:

cat << EOF | kubectl apply -f -
apiVersion: configsync.gke.io/v1beta1
kind: RootSync
metadata:
  name: root-sync
  namespace: config-management-system
spec:
  sourceFormat: unstructured
  sourceType: helm
  helm:
    repo: oci://ghcr.io/mathieu-benoit
    chart: my-chart
    version: 0.1.0
    releaseName: my-chart
    auth: none
EOF

Also, I confirm that this docker flow is working successfully with this setup too:

docker pull gcr.io/google-samples/hello-app:1.0
docker tag gcr.io/google-samples/hello-app:1.0 localhost:${reg_port}/hello-app:1.0
docker push localhost:${reg_port}/hello-app:1.0
kubectl create deployment hello-server --image=localhost:${reg_port}/hello-app:1.0

Not sure if the error is coming from the Kind setup or from Config Sync, so logging this here. CC: @nan-yu @xinnywinne

Oct 02 '22 13:10 mathieu-benoit

In addition to the helm scenario explained in the main description of this issue, I just gave a try to the oci format, and I'm also getting an error, see steps to reproduce it too:

## Build the OCI artifact
cat <<EOF> test-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: test
EOF
tar -cf test-namespace.tar test-namespace.yaml
oras push \
    localhost:${reg_port}/test-namespace:v1 \
    test-namespace.tar

## Confirming that I can successfully pull this OCI artifact from local registry
oras pull localhost:${reg_port}/test-namespace:v1

## Sync this OCI artifact with Config Sync
cat << EOF | kubectl apply -f -
apiVersion: configsync.gke.io/v1beta1
kind: RootSync
metadata:
  name: root-sync
  namespace: config-management-system
spec:
  sourceFormat: unstructured
  sourceType: oci
  oci:
    image: localhost:${reg_port}/test-namespace:v1
    auth: none
EOF

Error on the RootSync:

errors:
      - code: "2004"
        errorMessage: |-
          KNV2004: unable to sync repo
          Error in the oci-sync container: {"Msg":"unexpected error fetching package, will retry","Err":"failed to pull image localhost:5001/test-namespace:v1: Get \"https://localhost:5001/v2/\": dial tcp [::1]:5001: connect: connection refused; Get \"http://localhost:5001/v2/\": dial tcp [::1]:5001: connect: connection refused","Args":{}}

          For more information, see https://g.co/cloud/acm-errors#knv2004
      lastUpdate: "2022-10-02T14:58:57Z"
      ociStatus:
        dir: .
        image: localhost:5001/test-namespace:v1

Oct 02 '22 15:10 mathieu-benoit

I expect this has to do with the details of local kind registry rather than a bug in helm/oci.

From https://kind.sigs.k8s.io/docs/user/local-registry/#using-the-registry

If you build your own image and tag it like localhost:5001/image:foo and then use it in kubernetes as localhost:5001/image:foo. And use it from inside of your cluster application as kind-registry:5000.

Have you tried using kind-registry as the registry name?

If that doesn't work - For our local/kind e2e testing we spin up a git server as a service in the cluster. We don't have any local/kind e2e testing for oci/helm yet, but the way I would probably go about that is to spin up the registry as a service in the cluster as well.

Oct 03 '22 16:10 sdowell

Hi @sdowell, to be honest I think I would like to see what could be seen in the code of both paths oci and helm, different implementation and different errors.

When I'm looking at this https://github.com/stefanprodan/flux-local-dev, it seems that they are able to deploy their OCI artifacts with the exact same setup as mine. I'm trying to see if there is any differences or something I'm missing. For exampe, I see that in their OCIRepository resource, they have insecure: true, which, AFAIK, we don't have, but I don't know if it could be related to the 2 issues I'm facing anyway.

And I think this setup could be a good win for the CI/e2e tests of CS with Helm/OCI too, as soon as we get this working?

Back to your suggestion:

using kind-registry as the registry name?

What do you mean? In the name of the image when doing push/pull, etc.?

Oct 03 '22 17:10 mathieu-benoit

Our kind tests run in parallel on multiple clusters so for our e2e testing use case we would probably want to isolate the registry for each cluster anyways.

From kind's documentation it sounds like kind-registry:5000 should be used from inside the cluster instead of localhost:5001. It also looks like they use kind-registry:5000 in the repo you linked to.

Oct 03 '22 17:10 sdowell

Gotcha, good catch, I will try that soon and will report back here, thanks @sdowell. Something around:

cat << EOF | kubectl apply -f -
apiVersion: configsync.gke.io/v1beta1
kind: RootSync
metadata:
  name: root-sync
  namespace: config-management-system
spec:
  sourceFormat: unstructured
  sourceType: oci
  oci:
    image: kind-registry:5000/test-namespace:v1
    auth: none
EOF

Same for Helm.

Oct 03 '22 17:10 mathieu-benoit

Hey @sdowell, following up on this, I just did some tests with helm.repo: oci://${reg_name}:${reg_internal_port} and oci.image: ${reg_name}:${reg_internal_port} in RootSyncs, but still getting same errors.

For oci, I get another error message, with oci.image: ${reg_name}:${reg_internal_port}:

failed to pull image kind-registry:5000/test-namespace:v1: Get https://kind-registry:5000/v2/: http: server gave HTTP response to HTTPS client

And with oci.image: localhost:${reg_port} it was:

failed to pull image localhost:5001/test-namespace:v1: Get https://localhost:5001/v2/: dial tcp [::1]:5001: connect: connection refused; Get \"http://localhost:5001/v2/

For helm in both cases, still the same error:

failed to render the helm chart: exit status 1, stdout: Error: failed to download oci://localhost:5001/test-chart at version 1.0.0

Oct 03 '22 22:10 mathieu-benoit

failed to pull image kind-registry:5000/test-namespace:v1: Get https://kind-registry:5000/v2/: http: server gave HTTP response to HTTPS client

It appears this has to do with the insecure flag that you referenced above. Looking into the OCI library that we use for fetching images, the transport only falls back on http for localhost or if the insecure flag is set.

I don't think our API currently supports toggling that flag and it's not currently planned to add support for that in the API. I expect it's a similar scenario for helm but it's just returning a generic error.

cc @nan-yu @xinnywinne

Oct 03 '22 22:10 sdowell

Hi @mathieu-benoit, I will look into this for helm support and let you know. Thanks @sdowell for helping do the initial analysis.

Oct 03 '22 23:10 xinnywinne

I confirmed that the insecure flag is the root cause of the OCI failure. After adding the flag, the updated oci-sync container is able to sync the image from image: kind-registry:5000/test-namespace:v1. We're aware of this issue and have created an internal bug to track it.

Oct 05 '22 19:10 nan-yu

Hi @mathieu-benoit, I am able to reproduce the Helm failure. When I debug it, I see the error message failed to do request: Head \"https://kind-registry:5000/v2/test-chart/manifests/0.1.0\": http: server gave HTTP response to HTTPS client, and here is an open issue relate to this: https://github.com/helm/helm/issues/6324. Currently, helm still not support pull from an insecure registry directly.

Oct 10 '22 16:10 xinnywinne

Thanks @nan-yu for tracking the issue internally for OCI.

Thanks @xinnywinne for diagnosing the issue with Helm. The link you shared is about the helm push command issue, but I don't have an issue with the push. On the other hand, I see that the helm pull command as this --insecure-skip-tls-verify parameter, do you think that could help?

Oct 12 '22 16:10 mathieu-benoit

helm pull/push/template work with oci://localhost:5001 but not oci://kind-registry:5000. I have tried helm template --insecure-skip-tls-verify, it does not help. Here is the most recent open issue: https://github.com/helm/helm/issues/11352. There is an open PR https://github.com/helm/helm/pull/9564 mentioned in this issue that relate to your question.

Oct 12 '22 17:10 xinnywinne

kpt-config-sync kpt-config-sync copied to clipboard

`Error: failed to download oci://localhost:5001/test-chart` with Kind and local registry

kpt-config-sync
kpt-config-sync copied to clipboard