controller-runtime icon indicating copy to clipboard operation
controller-runtime copied to clipboard

:bug: Panic in `APIServer.Stop` when `Authn == nil`

Open jglick opened this issue 2 years ago • 15 comments

After a test using this repo which failed with an error like

ERROR	controller-runtime.test-env	unable to start the controlplane	{"tries": 4, "error": "timeout waiting for process etcd to start successfully (it may have failed to start, or stopped unexpectedly before becoming ready)"}
sigs.k8s.io/controller-runtime/pkg/envtest.(*Environment).startControlPlane
	~/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/envtest/server.go:330
sigs.k8s.io/controller-runtime/pkg/envtest.(*Environment).Start
	~/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/envtest/server.go:260
…

I saw a further error

Test Panicked
runtime error: invalid memory address or nil pointer dereference

Full Stack Trace
sigs.k8s.io/controller-runtime/pkg/internal/testing/controlplane.(*APIServer).Stop(0x14)
	~/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/testing/controlplane/apiserver.go:425 +0x8f
sigs.k8s.io/controller-runtime/pkg/internal/testing/controlplane.(*ControlPlane).Stop(0xc00001a000)
	~/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/testing/controlplane/plane.go:87 +0x3d
sigs.k8s.io/controller-runtime/pkg/envtest.(*Environment).Stop(0xc00001a000)
	~/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/envtest/server.go:194 +0x114
…

This seems to be similar to #1724 (merged toward 0.11.0). Maybe related to #1750?

Untested (not known how to reproduce the original error with etcd).

jglick avatar Jan 24 '22 19:01 jglick

Welcome @jglick!

It looks like this is your first PR to kubernetes-sigs/controller-runtime 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/controller-runtime has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot avatar Jan 24 '22 19:01 k8s-ci-robot

Hi @jglick. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 24 '22 19:01 k8s-ci-robot

/ok-to-test

Can we add a test case as well?

vincepri avatar Jan 24 '22 19:01 vincepri

Can we add a test case as well?

Probably above my skill level.

jglick avatar Jan 24 '22 19:01 jglick

It looks to me like a side-effect of a more concerning bug. Why is it even nil?

The documentation says there should be a default value if it's empty https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/internal/testing/controlplane/apiserver.go#L51

This line configures it if it is empty https://github.com/kubernetes-sigs/controller-runtime/blob/5636d975d88e2072884fd82c75b5d3bacf274919/pkg/internal/testing/controlplane/apiserver.go#L264

@DirectXMan12 I think you wrote this, can you help us clarify?

AlmogBaku avatar Mar 21 '22 18:03 AlmogBaku

This is just a side effect of some other (properly reported) error: there is code which tries to clean up by stopping a service which in this case had not been fully initialized.

jglick avatar Mar 21 '22 18:03 jglick

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 19 '22 19:06 k8s-triage-robot

I think this remains valid.

jglick avatar Jun 23 '22 18:06 jglick

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jul 23 '22 18:07 k8s-triage-robot

/assign @hoegaarden

AlmogBaku avatar Jul 25 '22 11:07 AlmogBaku

@jglick you should make the test pass in order for us to merge it. maybe try to rebase?

AlmogBaku avatar Jul 25 '22 21:07 AlmogBaku

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: AlmogBaku, jglick Once this PR has been reviewed and has the lgtm label, please ask for approval from hoegaarden by writing /assign @hoegaarden in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jul 26 '22 17:07 k8s-ci-robot

Rebased as suggestion. https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_controller-runtime/1785/pull-controller-runtime-test-master/1485692352398888960 is opaque to me.

jglick avatar Jul 26 '22 17:07 jglick

/lgtm

FillZpp avatar Jul 27 '22 03:07 FillZpp

Can we add a test case as well?

Probably above my skill level.

There isn't much of a point in merging a fix without a test, the next change might just break it again.

alvaroaleman avatar Aug 12 '22 14:08 alvaroaleman

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Sep 11 '22 14:09 k8s-triage-robot

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Sep 11 '22 14:09 k8s-ci-robot

If somebody else lands on this page when running controller when following this guide https://book.kubebuilder.io/reference/envtest.html. For me the issue was because cfg, err = testEnv.Start() returned the error missing of missing /usr/local/kubebuilder/bin/etcd binary. We solved it by setting the binary using the env variable: KUBEBUILDER_ASSETS=__PROJECT_PATH__/bin/k8s/1.25.0-linux-amd64

Fgruntjes avatar Oct 03 '22 08:10 Fgruntjes