kubebuilder E2E K8s Context Protection

trafficstars

What do you want to happen?

The initial E2E suite created via kubebuilder does not protect against using a production kubernetes context.

This can result in removing monitoring and cert manager from production clusters.

Tilt for example only allows the user to run against a context matching kind-* by default, and requires the user to intentionally add other contexts.

🙏 please add this protection. It is hard to otherwise ensure it doesn't crop up across a company.

Extra Labels

No response

Nov 25 '24 21:11 reedjosh

Hi @reedjosh,

Thank you for raise this issue.

Could you please let me know if you are using a project with the latest scaffold? I'm just asking because it was already addressed.

See that in the Makefile, we check if kind is up and running:

https://github.com/kubernetes-sigs/kubebuilder/blob/19237b5aff74056a6729f3584ec2ce878694f503/testdata/project-v4/Makefile#L66-L82

Then, see the suite test scaffold that before all we try to load the image. If the tests are executed against a cluster that is not kind it should fail. Kind is not intended to be used in production so, it will not run against a production cluster without intentional changes

https://github.com/kubernetes-sigs/kubebuilder/blob/19237b5aff74056a6729f3584ec2ce878694f503/testdata/project-v4/test/e2e/e2e_suite_test.go#L78-L108

Moreover, we added checks to try to avoid re-install Prometheus and CertManager, as you can see more comments to clarify how it works and help users.

If you are using this version, then we might add a logic to exit(1) when it is not possible to load the image.

We are looking forward to your reply.

Nov 26 '24 00:11 camilamacedo86

What if the KUBECONFIG point to a production cluster and the kind cluster exists? I think the image will still be loaded in the cluster (since we set kind load docker-image --name <CLUSTER_NAME> to tell on which cluster we want to load the image) while having the KUBECONFIG pointing another cluster.

Nov 26 '24 07:11 damsien

Slight tangent, but this check is probably not working as intended. When you create a cluster with for example kind create cluster -n my-cluster, the result of kind get clusters will not have the kind- prefix; it will display the same name that was passed to the -n flag. e.g.

$ kind get clusters
linkerd-cluster
my-cluster

The prefix kind- is, however, added to the kubeconfig file. e.g.

cat ~/.kube/config | yq '.clusters.[].name'
"kind-linkerd-cluster"
"kind-my-cluster"

Nov 26 '24 09:11 monteiro-renato

@monteiro-renato I think that's how they ensure that the cluster is not a production cluster. Because usually production clusters are not prefixed with kind-. If you want to run the tests, you must have a kind- prefixed local cluster.

Nov 26 '24 09:11 damsien

Yea, I can see it now. When I got that error a while back I found it weird since I knew that I had a kind cluster running and with the correct context configured. I'm usually quite paranoid when it comes to tools that default to w.e is configured on the kubeconfig so I prefer to create separate envs (container or VM) so that I can have peace of mind knowing that I will never target a cluster I'm not expecting. But yea, I guess it's just that the error message could be a bit more explicit about it's intent.

Nov 26 '24 11:11 monteiro-renato

But yea, I think OP's concern is still valid. The validation should probably be done against the context configured in the kubeconfig instead of relying on the output of a kind command.

Nov 26 '24 11:11 monteiro-renato

Hi @monteiro-renato, @damsien, @reedjosh,

How can we validate the context or name to determine whether it’s a production environment for the user? This goes beyond what we can reasonably enforce. Moreover, if users execute any Makefile target against a production environment—such as make deploy, make install, or make undeploy—they may encounter serious issues.

Additionally, frameworks generally don’t handle this kind of validation. For example, tools like Terraform, Helm, etc., don’t verify whether a command targets a production environment. At some level, the user must understand the commands they’re running and the clusters they’re targeting.

On top of that:

Cluster admins should safeguard production environments by applying RBAC to restrict access for developers.
Developers should avoid configuring production context in their local dev environments an just leave it there. They should ensure that actions targeting production are separated from the development environment to prevent accidental operations. This is not a critical need specifically for e2e tests but applies to all possible actions that developers might execute using kubectl.

It was already discussed. We cannot validate the context configured by developers when they run the commands and say, "Ah, it's production, and it is not for the e2e tests," just as we do not perform such validation for any Makefile targets or features.

I hope that clarifies. I am looking forward to knowing if @reedjosh is or is not using a project with all changes related to this request. (as described in https://github.com/kubernetes-sigs/kubebuilder/issues/4391#issuecomment-2499282543) I really would appreciate it if @reedjosh could share this information with us.

Thank you.

Nov 26 '24 12:11 camilamacedo86

Hey @camilamacedo86,

We could create a kind cluster based on the project's name and then use kind's get kubeconfig subcommand to generate the kubeconfig file to use in the e2e tests. The tests would set the KUBECONFIG env var at the start of the tests to point to the location of the kubeconfig file generated by kind.

Nov 26 '24 14:11 monteiro-renato

Hi @camilamacedo86 , I agree that its pretty hard to prevent this & kubebuilder does not need to protect against all of this.

I think one improvement that is not too hard to achieve would be to extend the utils.Kubectl to also have the --context=value only set once and added to all kubectl calls.

I've personally ran my tests against the wrong context as it was not specified.

How did this happen? I've started a long running e2e test and then switched context in my shell. This cannot happen if the utils.Kubectl always specifies the --context .

With other clients (controller-runtime / e2e-testframework clients) this is not so much an issue as we create them before the tests with a hardcoded context.

Feb 06 '25 19:02 Sijoma

Could you please let me know if you are using a project with the latest scaffold? I'm just asking because it was already addressed.

I'm sorry for the delayed response. I believe we were using 4.30 at the time. We started the project and saw the issue around the middle of Nov. And we were using controllergen v0.16.x.

Without digging into the code I think that the current method is still broken, and implementing protections and workarounds like Tilt does would be ideal.

allow_k8s_context

Tilt checks k8s context names for the kind- prefix. If it's missing, it won't run.

The allow_k8s_contexts is then provided in case someone is running minikube or something else without the kind- prefix

Feb 12 '25 21:02 reedjosh

This goes beyond what we can reasonably enforce. Moreover, if users execute any Makefile target against a production environment—such as make deploy, make install, or make undeploy—they may encounter serious issues.

I understand the thinking here, but make deploy does seem like something that would modify a cluster.

make test or make e2e for a tool that nearly every k8s operator builder uses (including those brand new to the ecosystem) does not make it obvious that if you're accidentally logged into a meaningful cluster, things will get wiped out.

Feb 12 '25 22:02 reedjosh

Had a similar experience when running make test and make test-e2e, wasn't clear the context wasn't kind and the namespace was deleted.

Feb 12 '25 22:02 jravetch

Hi folks,

As discussed in Slack and another issue, adding --context=value and does not seem like a good approach

However, due to the complexities involved in avoiding this scenario and the fact that it still creates confusion for users, I was thinking about how to simplify it and solve the concerns raised. Additionally, since we now scaffold GitHub Actions to run E2E tests, we can improve this by:

Removing the Golang code that installs CertManager from the E2E tests. (ref: https://github.com/kubernetes-sigs/kubebuilder/blob/master/testdata/project-v4/test/e2e/e2e_suite_test.go#L66-L81)
Updating the E2E tests to validate the sample projects and docs tutorials for installing CertManager. (https://github.com/kubernetes-sigs/kubebuilder/blob/master/.github/workflows/test-e2e-book.yml and https://github.com/kubernetes-sigs/kubebuilder/blob/master/.github/workflows/test-e2e-samples.yml)
Modifying the GitHub Actions scaffold to install CertManager explicitly before running tests. (ref: https://github.com/kubernetes-sigs/kubebuilder/blob/master/testdata/project-v4/.github/workflows/test-e2e.yml)
Removing unnecessary complexities from the Makefile targets to simplify the test setup. (here: https://github.com/kubernetes-sigs/kubebuilder/blob/master/testdata/project-v4/Makefile#L64-L78)

This approach would make the test framework more predictable, reduce complexity, and align with best practices for managing dependencies in CI/CD pipelines as address all your concerns.

If someone would like to work on this one, please feel free !!!

@monteiro-renato, @damsien, @reedjosh, @reedjosh. WDYT? Please feel free to contribute if you wish.

Feb 13 '25 01:02 camilamacedo86

I think that sounds great! Only final concern is that this also seems to install and thus remove metrics. Would that be subject to the same changes?

Thank you for your continued work in making kubebuilder an excellent project!

Feb 13 '25 02:02 reedjosh

Hi @reedjosh,

Following the approach outlined in the comment above (https://github.com/kubernetes-sigs/kubebuilder/issues/4391#issuecomment-2655221151), external dependencies are no longer installed via the e2e tests. Instead, third-party dependencies are now handled through GitHub Actions, which seems to be the best approach.

This means that when you run make test-e2e, the project will be built, loaded with Kind, and the tests will be executed. Any required dependencies, such as CertManager or Prometheus, are managed within the GitHub Actions workflow.

Let me know if you have any questions and if the above answers your concerns.

Feb 13 '25 09:02 camilamacedo86

As discussed in Slack and another issue, adding --context=value and does not seem like a good approach

Do you have a link to that? I would like to understand the reasoning.

The protection is really important to also have at runtime. E2e tests are often quite slow - so there is a risk someone switches the context. We for example already protect against many cases at startup (this one is easy).

As far as removing CertManager setup etc. - I personally like this setup although we so far setup everything before testing with kind/make. It would still mean that you potentially run our E2e tests in the wrong context (production!) which might do something undesirable on the resources your tests modify.

Feb 13 '25 10:02 Sijoma

Hi @Sijoma,

Let’s go step by step to clarify the situation.

What is the problem?

When running make test-e2e, it installs Prometheus and CertManager.
If the current context is pointing to a cluster where these components are already installed, it may cause conflicts.

Why is this issue already resolved in the latest scaffold?

The latest scaffold has addressed this issue with the following changes:

✅ Prometheus is no longer installed.
✅ CertManager is only installed if:

Kind is running locally
Makefile#L63-L78
None of the CertManager CRDs are found in the cluster
e2e_suite_test.go#L71-L80
The kind load image command can be executed
e2e_suite_test.go#L63-L65

Given these updates, I don’t see how the original issue persists.
The scaffold now ensures that CertManager is only installed when necessary, and no kind-prefix configuration is required.
That’s why I suggested checking if you’re using the latest version.

Why don’t we set `--context=value` by default in the Makefile?

Following is a summary:

This has been discussed on other places like slack, Issue 4116 - Comment,Issue 4391 - Comment

1. How Do We Determine a Production Environment?

There is no universal way to validate whether a cluster is a production environment.
If we enforce a context check here, we would also need to check other Makefile targets (make deploy, make install, make undeploy), which could also impact production environments.
The same accidental scenario can occur with ANY target. Why was this never requested for other targets? Why should we add it only for make test-e2e?

2. Other Tools Do Not Enforce This

Terraform, Helm, and Kubernetes (kubectl) do NOT prevent users from running commands against production environments.
It is ultimately the user’s responsibility to ensure they are targeting the correct cluster.

3. Checking Context Does Not Solve the Issue

Developers can still accidentally run tests against a production cluster by triggering e2e tests directly, without using make test-e2e.
Using the Makefile is not mandatory. Many developers (including myself) trigger tests directly from an IDE.
In CI workflows, the Makefile target is typically used, but developers often run tests locally in different ways. Therefore, the suggestion of --context=value does NOT solve the issue described here.

"This change would negatively impact users, as it would introduce unnecessary complexities into Kubebuilder that all users do not commonly require. It would also create inconsistent behaviour compared to other targets and deviate from standard practices followed by other tools. Additionally, it offers NO little real protection against misconfiguration and does not effectively address the underlying issue."

Alternative Approach – GitHub Actions for Managing Dependencies

If some users still think this is an issue, an alternative approach has already been proposed:
Ensure that GitHub Actions install the required dependencies instead.

📌 Discussed here: Issue 4391 - Comment

How This Works:

External dependencies (e.g., CertManager, Prometheus) are no longer installed via e2e tests.
Instead, they are managed within GitHub Actions.
When running make test-e2e:
- The project is built
- It is loaded with Kind
- The tests are executed
All required dependencies are handled by GitHub Actions, preventing unnecessary installations in local environments.

Pros of This Approach

Simplifies the overall setup in the Makefile and Go test code

Cons of This Approach

Make harder for devs run the e2e tests locally and debug they will need to have a pre-setup. Then alternatively, we would need to study the possibility to have a target setup-e2e test instead of install Prometheus via GitHub Actions

I hope this clarifies everything. Looking forward to your thoughts!

Thank you! 🚀

Feb 13 '25 12:02 camilamacedo86

Hi @Sijoma, @reedjosh, @damsien, @jravetch, @monteiro-renato,

After further consideration, I’ve updated the pros and cons of the alternative solution proposed in the comment above: link to comment, where we discuss centralizing everything.

Honestly, I believe the best course of action here is:

Closing this issue as resolved – The problem originally raised can no longer occur in the latest versions.
Rejecting the proposed solution to modify the Makefile target to add context – This does not address the problem at all. As outlined earlier, E2E tests can be triggered via an IDE (which is often the case), meaning the same issue would still occur. Additionally, this approach would introduce unnecessary complexity, deviate from common standards, and not provide significant benefits to general users. However, end users are always free to modify their own scaffold if needed. (more info: https://github.com/kubernetes-sigs/kubebuilder/issues/4391#issuecomment-2656404852)
Considering an enhancement properly – If we want to improve this, I think the best approach is first to analyze the pros and cons carefully. So far, it seems that the best way is for a proposal (PEP) to be required. Referencing the PEP template would be helpful. Frequent back-and-forth changes to the scaffold can lead to a poor user experience—these changes only affect new projects, but users still need to update, and modifying it impacts workflows and pipeline configurations. Any potential change here, I think, would be the best approach; we have a design doc, consider all pros and cons, and ask for a community feedback process before moving forward.

So, I would suggest we close this issue based on the above considerations.
That said, I believe the best way forward would be:

(A) Open a new issue with clear steps to reproduce the scenario based on the testdata samples (master branch), OR/AND
(B) Open a PR proposing changes using the PEP template.

Let me know your thoughts!

Feb 15 '25 09:02 camilamacedo86

Totally fine for me 👍 . It's a never ending problem. Also agree that adding the context to the Makefile does not help at all.

Rejecting the proposed solution to modify the Makefile target to add context

Just to clarify - I personally meant adding the --context on the Code by extending the utils.Kubectl to specify a context for the whole test suite. Same for the Kubernetes Context Config. This could then be loaded from the environment once on startup of the test suite.

This would also protect you when you use the IDE to run tests (like I do too).

We wrap the utils.Kubectl from Kubebuilder like so.

Thank you @camilamacedo86 !

Feb 15 '25 19:02 Sijoma

We were recently bit by this again and this broke a cluster. Really frustrating! I created a different issue for this, #4116, which you closed without understanding what I was proposing. So I'll bring up the suggestion here as well:

Change end-to-end tests to be opt-in!

The default Makefile now checks for the kubeconfig to target the correct context, but if you run go test ./... it does not make this check. So this check needs to be moved into the Go code, or you just change the tests to be opt-in by using Go build tags:

//go:build e2e
// +build e2e

Then in your Makefile you would just add the -tags e2e flag to the command:

 .PHONY: test-e2e  # Run the e2e tests against a Kind k8s instance that is spun up.
 test-e2e:
-        go test ./test/e2e/ -v -ginkgo.v
+        go test ./test/e2e/ -v -tags e2e -ginkgo.v

This shows a way better signal of intent from the user. You either run it by doing make test-e2e where the kubeconfig context kind- prefix check is made, or you run it with go test ./... -tags e2e, where you would have to know this tag beforehand so you'll probably be way better informed about what it implies.

Not saying you shouldn't add --context flag to every kubectl command. That's still a very good idea. But one improvement does not exclude the other.

Mar 26 '25 15:03 applejag

HI @applejag

We were recently bit by this again and this broke a cluster

It seems that you have not updated your project with the latest changes. I have no idea how that would be possible with the latest changes in place. Can you please ensure that you have your project with the latest changes? If so, can you please let us know what steps were performed and what issue was faced?

Suggestion to use tags

Since you clarified your suggestion at https://github.com/kubernetes-sigs/kubebuilder/issues/4116, I reopened this one. It has no relation with the scope of this issue. It is another suggestion right?

Mar 26 '25 15:03 camilamacedo86

We were recently bit by this again and this broke a cluster

It seems that you have not updated your project with the latest changes. I have no idea how that would be possible with the latest changes in place. Can you please ensure that you have your project with the latest changes? If so, can you please let us know what steps were performed and what issue was faced?

An engineer ran go test ./... in our repo and the tests overwrote our certmanager and Prometheus CRDs in one of our clusters. But perhaps we've not updated far enough. Are you saying this will not happen anymore?

Suggestion to use tags

Since you clarified your suggestion at https://github.com/kubernetes-sigs/kubebuilder/issues/4116, I reopened this one. It has no relation with the scope of this issue. It is another suggestion right?

It is another suggestion, yes.

Mar 26 '25 18:03 applejag

HI @applejag

An engineer ran go test ./... in our repo and the tests overwrote our certmanager and Prometheus CRDs in one of our clusters. But perhaps we've not updated far enough. Are you saying this will not happen anymore?

We no longer install Prometheus, so it's not possible to re-install it using the latest version of the scaffold.

Furthermore, using Kind in a production environment is not a viable option. As previously explained, the tests are specifically designed to run in Kind, and they will fail to load the image outside of that environment. You can find more context here: https://github.com/kubernetes-sigs/kubebuilder/issues/4391#issuecomment-2656404852

It seems your project may be missing some of the recent updates related to this change.
Please ensure that your project is upgraded to the latest scaffold. If you still encounter issues, feel free to open a new issue with a clear description of the steps you followed, so we can better assist.

Mar 26 '25 19:03 camilamacedo86

Nice, thanks!

Sorry if I came off as hostile because I could feel I was a bit mad about the situation. Excellent that you've employed so many contingencies already.

I still believe the e2e build tag should be added, but that's for the other issue.

Will look into upgrading our scaffold promptly!

Mar 26 '25 22:03 applejag

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 25 '25 07:06 k8s-triage-robot

Hey folks 👋

Thanks again for all the feedback and concerns raised — it’s been super helpful. Based on what was discussed, we've made several improvements:

✅ What's been addressed

Prometheus is no longer installed in E2E tests
It wasn’t actively used, so removing it eliminates ~50% of the original issues raised.
Kind status checks before tests
We now check that kind is installed and the cluster is up before running tests.
Image loading for cert-manager
Before running tests and installing cert-manager, we attempt to load the required image into kind. If kind isn't running or we're targeting another environment, the tests will fail early.
Smart cert-manager handling
We've added a utility that checks if cert-manager APIs are already applied. If so, we skip both install and uninstall steps.
More flexible test runs
We've improved documentation and added an ENV VAR so you can run E2E tests without cert-manager if needed.
E2E test tagging
E2E tests are now properly tagged, so running go test ./... won't accidentally trigger them.: https://github.com/kubernetes-sigs/kubebuilder/pull/4946

✅ Conclusion

We believe these updates fully address the concerns raised in this issue, so I’ll go ahead and close it.

However, if you check the master branch and still see something missing or have ideas for improvement, please feel free to:

Open a new issue with the specific request or problem (ideally, one per issue so we can better track and address them), or
Send us a PR — contributions are always welcome!

Your feedback and contributions mean a lot to the Kubebuilder maintainers and the community 💛

Thanks again!

Jul 20 '25 11:07 camilamacedo86

kubebuilder kubebuilder copied to clipboard

E2E K8s Context Protection

What do you want to happen?

Extra Labels

Hi @Sijoma,

What is the problem?

Why is this issue already resolved in the latest scaffold?

Why don’t we set --context=value by default in the Makefile?

1. How Do We Determine a Production Environment?

2. Other Tools Do Not Enforce This

3. Checking Context Does Not Solve the Issue

Alternative Approach – GitHub Actions for Managing Dependencies

How This Works:

Pros of This Approach

Cons of This Approach

✅ What's been addressed

✅ Conclusion

kubebuilder
kubebuilder copied to clipboard

Why don’t we set `--context=value` by default in the Makefile?