[BUG] Reasonable Image Size -- Could Not Open Tar File
Describe the bug Looks like the cnf-testsuite cannot open the tar version of the chart to check image size. It appears to store it in "tmp". It does mention image not found, however the pod is running with the specified image
To Reproduce Steps to reproduce the behavior:
- Run "reasonable_image_size"
Expected behavior It would find the image size
Logs
I, [2022-05-19 01:33:28 -04:00 #2783727] INFO -- cnf-testsuite: KubectlClient.exec command: kubectl exec -n cnf-testsuite dockerd -t -- docker save localhost:32000/matrixx-activemq:5250-SNAPSHOT -o /tmp/image.tar
I, [2022-05-19 01:33:28 -04:00 #2783727] INFO -- cnf-testsuite: KubectlClient.exec stderr: Error response from daemon: reference does not exist
command terminated with exit code 1
I, [2022-05-19 01:33:28 -04:00 #2783727] INFO -- cnf-testsuite: KubectlClient.exec command: kubectl exec -n cnf-testsuite dockerd -t -- gzip -f /tmp/image.tar
I, [2022-05-19 01:33:28 -04:00 #2783727] INFO -- cnf-testsuite: KubectlClient.exec stderr: gzip: /tmp/image.tar: No such file or directory
command terminated with exit code 1
I, [2022-05-19 01:33:28 -04:00 #2783727] INFO -- cnf-testsuite: KubectlClient.exec command: kubectl exec -n cnf-testsuite dockerd -t -- wc -c /tmp/image.tar.gz | awk '{print$1}'
I, [2022-05-19 01:33:28 -04:00 #2783727] INFO -- cnf-testsuite: KubectlClient.exec stderr: wc: /tmp/image.tar.gz: No such file or directory
command terminated with exit code 1
I, [2022-05-19 01:33:28 -04:00 #2783727] INFO -- cnf-testsuite: compressed_size: localhost:32000/matrixx-activemq:5250-SNAPSHOT = ''
E, [2022-05-19 01:33:28 -04:00 #2783727] ERROR -- cnf-testsuite: [31minvalid compressed_size: localhost:32000/matrixx-activemq:5250-SNAPSHOT = '', Invalid Int64: [0m
I, [2022-05-19 01:33:28 -04:00 #2783727] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Deployment resource_name: engine-controller namespace: default
I, [2022-05-19 01:33:28 -04:00 #2783727] INFO -- cnf-testsuite: KubectlClient::Get.resource command: kubectl get Deployment engine-controller -o json -n default
I, [2022-05-19 01:33:28 -04:00 #2783727] INFO -- cnf-testsuite: kubectl get resource volumes: [{"configMap" => {"defaultMode" => 420, "name" => "engine-controller-templates"}, "name" => "engine-controller-templates"}]
@EricLo-417 Looks like the docker pod on the cluster cannot access the container registry where the image is stored. Most likely needs additional cluster configuration. Can you help out with some info below to help us proceed?
- What is the address of the container registry used in the manifest or helm chart?
- Is this registry hosted within the kubernetes cluster?
Attached you find my kind config used when running the cnf-testsuite We are deploying a docker image registry on the host machine that the kind cluster connects to and uses
The path of the image used in the helm chart is localhost:32000
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
containerdConfigPatches:
- |-
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."localhost:32000"]
endpoint = ["http://kind-image-registry:5000"]
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
extraMounts:
- hostPath: /home/eric/Documents/helm/certifications/cnf-testsuite/matrixx-chf-charging-bridge/target/automated-k8s-tests/tests/Target+0-CNF-Testsuite_Workload-CNF_Testsuite_Workload/data
containerPath: /home/data
`
@EricLo-417 Please try the below mentioned changes and let us know if these changes resolve the issue.
- Update registry address used in the helm chart of the CNF to
kind-image-registry:5000. After the update, the image address would look like thiskind-image-registry:5000/matrixx-activemq:5250-SNAPSHOT - To reflect the above update, the config value for
containerdConfigPatchesshould be updated to the value below (note the inversion of key/value compared to the earlier kind config)
containerdConfigPatches:
- |-
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."kind-image-registry:5000"]
endpoint = ["localhost:32000"]
@EricLo-417 Please do let us know if you face any issues with my notes above.
@agentpoyo @lixuna Moving this issue to review and close.
No AC Required; will reopen if end user faces continued issues after the config issue was identified.
I have just tried it and getting the same error just now with kind-image-registry
invalid compressed_size: kind-image-registry:5000/sba-5gc-networkfunctions-chf-standalone:5260-SNAPSHOT = '', Invalid Int64: ""
Here's an observation, both using the same latest main checkout code:
On a new latest Ubuntu 22.04 instance we are building to replace an older Ubuntu dev machine, I see the int64 error:
pair@cnfdev4:~/workspace/drew/cnf-testsuite$ ./cnf-testsuite reasonable_image_size
✖️ FAILED: Image size too large 🦖 ⚖️👀
E, [2023-01-31 16:26:04 +00:00 #1645989] ERROR -- cnf-testsuite: invalid compressed_size: coredns/coredns:1.7.1 = '', Invalid Int64: ""
pair@cnfdev4:~/workspace/drew/cnf-testsuite$ crystal version
Crystal 1.6.0 [41573fadb] (2022-10-06)
LLVM: 13.0.1
Default target: x86_64-unknown-linux-gnu
On our older Ubuntu build machine based on Debian Buster but same crystal versions, it passes without error:
pair@cnfdev03:~/workspace/drew/cnf-testsuite$ ./cnf-testsuite reasonable_image_size
✔️ PASSED: Image size is good 🐜 ⚖️👀
pair@cnfdev03:~/workspace/drew/cnf-testsuite$ crystal version
Crystal 1.6.0 [41573fadb] (2022-10-06)
LLVM: 13.0.1
Default target: x86_64-unknown-linux-gnu
Both were tested using the sample-coredns-cnf in our samples/ directory in the code.
I was able to reproduce this issue on Ubuntu 22.04, on both the main branch and the latest released binary v0.42.2.
I put up a git repo with the code required to reproduce the issue - https://git.sr.ht/~akash/cnf-testsuite-issue-1597 The readme on the repo has detailed steps to reproduce the issue along with screenshots.
I'll look into what is causing the issue and see if we can fix it.
This issue was occurring due to two reasons.
1. Insecure access was not configurable for the docker daemon being setup by the testsuite
Private registries hosted on the cluster usually have to be trusted to use registry API endpoints via HTTP. The docker daemon has to be configured to be allowed to access the private image registry on the cluster via HTTP endpoints.
2. Docker daemon requires FQDN to use a private registry
The Docker daemon is setup by the testsuite on the cnf-testsuite namespace. For the reasonable_image_size test, when the testsuite is referring to an image registry on another namespace, the testsuite requires the service discovery url for the service. It cannot access the registry with just the service name.
So if the helm chart mentions foobar:5000/coredns:1.6.7, the docker daemon wouldn't know which namespace the service is on.
Proposed solution in PR-1800
I've added two options for cnf-testsuite.yml to allow some configuration for private image registries. Please review PR-1800 for notes about these options.
With those options, assuming the image registry is running as the kind-image-registry service on the default namespace and serving on port 5000, the following options would have to be added to the cnf-testsuite.yml for the CNF.
docker_insecure_registries: ["kind-image-registry.default.svc.cluster.local:5000"]
image_registry_fqdns:
"kind-image-registry:5000": "kind-image-registry.default.svc.cluster.local:5000"
Acceptance Criteria
- [x] Find or build k8s cluster for testing
- [x] Checkout the
bug/1597branch and build the cnf-testsuite binary from source - [x] Setup the sample-coredns-cnf sample CNF which fails with
int64for the reasonable image size test. - [x] Run workload, cert or
reasonable_image_sizetests - [x] I should not see error
int64for reasonable image size. - [x] I can see the output here.
@HashNuke
Did you create any samples to use with the new YAML options?
Also, doing a regular pull of coredns still results with int64 error because the /tmp/image.tar doesn't exist but if I manually run the docker commands to pull, save and then gzip to pull the size, they all succeed.
LOG Snippet:
I, [2023-06-22 21:23:08 +00:00 #2035884] INFO -- cnf-testsuite: FQDN of the docker image: coredns/coredns:1.7.1
I, [2023-06-22 21:23:08 +00:00 #2035884] INFO -- cnf-testsuite: KubectlClient.exec command: kubectl exec -n cnf-testsuite dockerd -t -- docker pull coredns/coredns:1.7.1
I, [2023-06-22 21:23:13 +00:00 #2035884] INFO -- cnf-testsuite: KubectlClient.exec stderr: error pulling image configuration: download failed after attempts=6: remote error: tls: handshake failure
command terminated with exit code 1
I, [2023-06-22 21:23:13 +00:00 #2035884] INFO -- cnf-testsuite: KubectlClient.exec command: kubectl exec -n cnf-testsuite dockerd -t -- docker save coredns/coredns:1.7.1 -o /tmp/image.tar
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite: KubectlClient.exec stderr: Error response from daemon: reference does not exist
command terminated with exit code 1
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite: KubectlClient.exec command: kubectl exec -n cnf-testsuite dockerd -t -- gzip -f /tmp/image.tar
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite: KubectlClient.exec stderr: gzip: /tmp/image.tar: No such file or directory
command terminated with exit code 1
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite: KubectlClient.exec command: kubectl exec -n cnf-testsuite dockerd -t -- wc -c /tmp/image.tar.gz | awk '{print$1}'
✖️ FAILED: Image size too large 🦖 ⚖️👀
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite: KubectlClient.exec stderr: wc: /tmp/image.tar.gz: No such file or directory
command terminated with exit code 1
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite: compressed_size: coredns/coredns:1.7.1 = ''
E, [2023-06-22 21:23:14 +00:00 #2035884] ERROR -- cnf-testsuite: invalid compressed_size: coredns/coredns:1.7.1 = '', Invalid Int64: ""
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: coredns-coredns namespace: cnfspace
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite: cmd: /usr/bin/cnf-testsuite reasonable_image_size
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite: task_type_by_task
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite: points: {"name" => "reasonable_image_size", "tags" => "microservice, dynamic, workload, cert, normal"}
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite: resp: ["microservice", "dynamic", "workload", "cert", "normal"]
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite: task_type x: microservice acc:
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite: task_type x: dynamic acc:
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite: task_type x: workload acc:
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite: task_type x: cert acc:
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite: task_type x: normal acc: cert
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite: task_type: normal
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite: upsert_task: task: reasonable_image_size has status: failed and is awarded: 0 points
I, [2023-06-22 21:23:14 +00:00 #2035884] INFO -- cnf-testsuite: results yaml: {"name" => "cnf testsuite", "testsuite_version" => "bug/1597-2023-06-22-212308-bd29e75f", "status" => nil, "command" => "/usr/bin/cnf-testsuite reasonable_image_size", "points" => nil, "exit_code" => 0, "items" => [{"name" => "reasonable_image_size", "status" => "failed","type" => "normal", "points" => 0}]}
Manually running docker commands to successfully save image:
pair@cnfdev4:~/workspace/drew/temp$ docker save coredns/coredns:1.7.1 -o /tmp/image.tar
pair@cnfdev4:~/workspace/drew/temp$ gzip -f /tmp/image.tar
pair@cnfdev4:~/workspace/drew/temp$ wc -c /tmp/image.tar.gz | awk '{print$1}'
12821400
@agentpoyo I added the cmds to reproduce issue and also a sample CNF with fixed config to this git repository. Apologies missed posting this on the PR. The repo has a "sample-coredns-FIXED" dir with a cnf that will pass the test.
Word of caution: docker on pair machine's kind cluster has some issue pulling the coredns image. But I tested it on a fresh server and it works as expected. If you find the same error as the screenshot below then I would recommend using a fresh server. This issue IMHO can be fixed separately.
Docker error on pair machine
This bug fix was released in https://github.com/cncf/cnf-testsuite/releases/tag/v0.42.3
We are using release 0.42.3 and are failing the test with this error message:
[31m✖️ FAILED: Image size too large 🦖 ⚖️👀[0m I, [2023-07-13 18:22:28 -04:00 #1262142] INFO -- cnf-testsuite: KubectlClient.exec stderr: wc: /tmp/image.tar.gz: No such file or directory command terminated with exit code 1 I, [2023-07-13 18:22:28 -04:00 #1262142] INFO -- cnf-testsuite: compressed_size: localhost:32000/mtx-tra:5270-SNAPSHOT = '' E, [2023-07-13 18:22:28 -04:00 #1262142] ERROR -- cnf-testsuite: [31minvalid compressed_size: localhost:32000/mtx-tra:5270-SNAPSHOT = '', Invalid Int64: ""[0m I, [2023-07-13 18:22:28 -04:00 #1262142] INFO -- cnf-testsuite: cmd: /home/dwilmes/IdeaProjects/helm/certifications/cnf-testsuite/matrixx-dcp/target/test-classes/konstruxx/working/tests/runCNFTestSuite/cnf-testsuite reasonable_image_size wait_count=100 I, [2023-07-13 18:22:28 -04:00 #1262142] INFO -- cnf-testsuite: task_type_by_task I, [2023-07-13 18:22:28 -04:00 #1262142] INFO -- cnf-testsuite: points: {"name" => "reasonable_image_size", "tags" => "microservice, dynamic, workload, cert, normal"} I, [2023-07-13 18:22:28 -04:00 #1262142] INFO -- cnf-testsuite: resp: ["microservice", "dynamic", "workload", "cert", "normal"] I, [2023-07-13 18:22:28 -04:00 #1262142] INFO -- cnf-testsuite: task_type x: microservice acc: I, [2023-07-13 18:22:28 -04:00 #1262142] INFO -- cnf-testsuite: task_type x: dynamic acc: I, [2023-07-13 18:22:28 -04:00 #1262142] INFO -- cnf-testsuite: task_type x: workload acc: I, [2023-07-13 18:22:28 -04:00 #1262142] INFO -- cnf-testsuite: task_type x: cert acc: I, [2023-07-13 18:22:28 -04:00 #1262142] INFO -- cnf-testsuite: task_type x: normal acc: cert I, [2023-07-13 18:22:28 -04:00 #1262142] INFO -- cnf-testsuite: task_type: normal I, [2023-07-13 18:22:28 -04:00 #1262142] INFO -- cnf-testsuite: upsert_task: task: reasonable_image_size has status: failed and is awarded: 0 points I, [2023-07-13 18:22:28 -04:00 #1262142] INFO -- cnf-testsuite: results yaml: {"name" => "cnf testsuite", "testsuite_version" => "v0.42.3", "status" => nil, "command" => "/home/dwilmes/IdeaProjects/helm/certifications/cnf-testsuite/matrixx-dcp/target/test-classes/konstruxx/working/tests/runCNFTestSuite/cnf-testsuite reasonable_image_size wait_count=100", "points" => nil, "exit_code" => 0, "items" => [{"name" => "reasonable_image_size", "status" => "failed", "type" => "normal", "points" => 0}]}
@daniel-wilmes Can you please confirm if the cnf-testsuite.yml flle is using the new registry options introduced?
The docs for the options are available in the CNF Testsuite YAML usage file - https://github.com/cncf/cnf-testsuite/blob/main/CNF_TESTSUITE_YML_USAGE.md#image_registry_fqdns
Here is an example that uses both the docker_insecure_registries and image_registry_fqdns options - https://git.sr.ht/~akash/cnf-testsuite-issue-1597/tree/main/item/sample-coredns-FIXED/cnf-testsuite.yml
@daniel-wilmes Is this issue still occurring? If so, please share additional information to help with debugging.
Sent log file through slack
@HashNuke @agentpoyo please see slack for logs
Acceptance Criteria
- [x] When the new config options are added to the cnf-testsuite.yml config file, I should not see "error opening tar file" in the logs or the tests
@daniel-wilmes is this working as expected for you? Can you please send the logs for our review?
In weekly meeting, was told that reasonable image size is not passing as expected. @daniel-wilmes please send the logs and we will assist.
@daniel-wilmes Please let us know if this issue can be closed
@lixuna Daniel had shared a log file. I'll look into the issue and update this ticket appropriately.
Shared the error snippet from the log file with Daniel in private chat. The errors point to the the secret not being available to access the private docker registry (to pull the image).
Testsuite looks for the secret (from imagePullSecrets in the spec template) in the namespace of the resource to access secure private registries.
Closing this issue as we now know the cause is outside of the testsuite.