crc icon indicating copy to clipboard operation
crc copied to clipboard

[BUG] : crc start is having "r.CreateEndpoint() = connection was refused" log line

Open onderson opened this issue 2 years ago • 11 comments

General information

  • OS: macOS
  • Hypervisor: hyperkit
  • Did you run crc setup before starting it (Yes/No)? Yes
  • Running CRC on: Laptop

CRC version

CRC version: 2.3.0+502bf1d9
OpenShift version: 4.10.12
Podman version: 3.4.4

CRC status

DEBU CRC version: 2.3.0+502bf1d9
DEBU OpenShift version: 4.10.12
DEBU Podman version: 3.4.4
DEBU Running 'crc status'
DEBU Checking file: /Users/burgekaraer/.crc/machines/crc/.crc-exist
DEBU Checking file: /Users/burgekaraer/.crc/machines/crc/.crc-exist
DEBU Running SSH command: df -B1 --output=size,used,target /sysroot | tail -1
DEBU Using ssh private keys: [/Users/burgekaraer/.crc/machines/crc/id_ecdsa /Users/burgekaraer/.crc/cache/crc_vfkit_4.10.12_amd64/id_ecdsa_crc]
DEBU SSH command results: err: <nil>, output: 85350920192 15333134336 /sysroot
DEBU Unexpected operator status for etcd: RecentBackup
CRC VM:          Running
OpenShift:       Running (v4.10.12)
Podman:
Disk Usage:      15.33GB of 85.35GB (Inside the CRC VM)
Cache Usage:     36.98GB
Cache Directory: /Users/burgekaraer/.crc/cache

CRC config

- consent-telemetry                     : no
- cpus                                  : 8
- disk-size                             : 80
- memory                                : 16384

Host Operating System

ProductName:	macOS
ProductVersion:	12.4
BuildVersion:	21F79

Steps to reproduce

crc start

Expected

no existence of r.CreateEndpoint() = connection was refused

Actual

r.CreateEndpoint() = connection was refused

Logs

Checking if running as non-root Checking if crc-admin-helper executable is cached Checking for obsolete admin-helper executable Checking if running on a supported CPU architecture Checking minimum RAM requirements Checking if crc executable symlink exists Checking if running emulated on a M1 CPU Checking if vfkit is installed Checking if old launchd config for tray and/or daemon exists Checking if crc daemon plist file is present and loaded Loading bundle: crc_vfkit_4.10.12_amd64... CRC VM is running Check internal and public DNS query... Check DNS query from host... Verifying validity of the kubelet certificates... Starting OpenShift cluster... [waiting for the cluster to stabilize] r.CreateEndpoint() = connection was refused Operators are stable (3/3)...

Before gather the logs try following if that fix your issue

$ crc delete -f
$ crc cleanup
$ crc setup
$ crc start --log-level debug

Please consider posting the output of crc start --log-level debug on http://gist.github.com/ and post the link in the issue.

onderson avatar Jun 08 '22 10:06 onderson

@onderson It is coming from https://github.com/containers/gvisor-tap-vsock/blob/main/pkg/services/forwarder/tcp.go#L44, does it break crc usecase for you or it just error message which bother you?

praveenkumar avatar Jun 08 '22 10:06 praveenkumar

@praveenkumar only the error message. i have not had further issue so far. i raised it to see to have some sort of certainity of its implications since i am not sure if that would cause anything further.

onderson avatar Jun 08 '22 10:06 onderson

@onderson it shouldn't because I think it just one miss tcp forward, do keep using the cluster and let us know if you faced any issue.

praveenkumar avatar Jun 08 '22 10:06 praveenkumar

@praveenkumar ,

after stop and start, i am getting.

INFO Check internal and public DNS query...
WARN Failed public DNS query from the cluster: ssh command error:
command : curl --head quay.io
err     : Process exited with status 7
 :

i think there is an issue with networking. how i can debug and fix it further?

onderson avatar Jun 09 '22 09:06 onderson

@onderson did something change during stop => start on the host. As per our integration tests we do test this scenario and didn't observe any network issue.

praveenkumar avatar Jun 13 '22 05:06 praveenkumar

@praveenkumar , no nothing has changed. this is kind of frequent interminent issue i guess. because i saw it has passed after a while. i also see the cluster is crashing after a while

onderson avatar Jun 14 '22 07:06 onderson

I'm experiencing the same thing. I can get it running again by stopping, deleting and cleaning. It comes up, then it does not error out on quay, I can deploy a few manifests, make projects etc, but after 10-15 minutes, it just dies on me.

roderik avatar Jun 19 '22 11:06 roderik

@roderik Is it die because of resource limitation since you deploy couple of manifests as part of workload? Can you try to increase the memory/cpu as per your requirement and see if still die for you?

praveenkumar avatar Jun 20 '22 05:06 praveenkumar

Went with 32gb and 16 cpu's, it is a big manifest. I found some too many open files errors in the logs, but i could not get it to work even with a ulimit of 10k. Moved to the IBM cloud where it deploys like it should

roderik avatar Jun 20 '22 14:06 roderik

@roderik looks like you found solution (not with CRC because of resource constraint around workload) with IBM cloud.

praveenkumar avatar Jun 22 '22 03:06 praveenkumar

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 20 '22 21:09 stale[bot]

@onderson Thanks for the issue, since we have new version of crc can you try that and create new one if hit issue again.

/close

praveenkumar avatar Sep 06 '23 11:09 praveenkumar

@praveenkumar: Closing this issue.

In response to this:

@onderson Thanks for the issue, since we have new version of crc can you try that and create new one if hit issue again.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar Sep 06 '23 11:09 openshift-ci[bot]