kubebuilder
kubebuilder copied to clipboard
E2E K8s Context Protection
What do you want to happen?
The initial E2E suite created via kubebuilder does not protect against using a production kubernetes context.
This can result in removing monitoring and cert manager from production clusters.
Tilt for example only allows the user to run against a context matching kind-* by default, and requires the user to intentionally add other contexts.
🙏 please add this protection. It is hard to otherwise ensure it doesn't crop up across a company.
Extra Labels
No response
Hi @reedjosh,
Thank you for raise this issue.
Could you please let me know if you are using a project with the latest scaffold? I'm just asking because it was already addressed.
- See that in the Makefile, we check if kind is up and running:
https://github.com/kubernetes-sigs/kubebuilder/blob/19237b5aff74056a6729f3584ec2ce878694f503/testdata/project-v4/Makefile#L66-L82
- Then, see the suite test scaffold that before all we try to load the image. If the tests are executed against a cluster that is not kind it should fail. Kind is not intended to be used in production so, it will not run against a production cluster without intentional changes
https://github.com/kubernetes-sigs/kubebuilder/blob/19237b5aff74056a6729f3584ec2ce878694f503/testdata/project-v4/test/e2e/e2e_suite_test.go#L78-L108
- Moreover, we added checks to try to avoid re-install Prometheus and CertManager, as you can see more comments to clarify how it works and help users.
If you are using this version, then we might add a logic to exit(1) when it is not possible to load the image.
We are looking forward to your reply.
What if the KUBECONFIG point to a production cluster and the kind cluster exists? I think the image will still be loaded in the cluster (since we set kind load docker-image --name <CLUSTER_NAME> to tell on which cluster we want to load the image) while having the KUBECONFIG pointing another cluster.
Slight tangent, but this check is probably not working as intended.
When you create a cluster with for example kind create cluster -n my-cluster, the result of kind get clusters will not have the kind- prefix; it will display the same name that was passed to the -n flag. e.g.
$ kind get clusters
linkerd-cluster
my-cluster
The prefix kind- is, however, added to the kubeconfig file. e.g.
cat ~/.kube/config | yq '.clusters.[].name'
"kind-linkerd-cluster"
"kind-my-cluster"
@monteiro-renato I think that's how they ensure that the cluster is not a production cluster. Because usually production clusters are not prefixed with kind-. If you want to run the tests, you must have a kind- prefixed local cluster.
Yea, I can see it now. When I got that error a while back I found it weird since I knew that I had a kind cluster running and with the correct context configured. I'm usually quite paranoid when it comes to tools that default to w.e is configured on the kubeconfig so I prefer to create separate envs (container or VM) so that I can have peace of mind knowing that I will never target a cluster I'm not expecting. But yea, I guess it's just that the error message could be a bit more explicit about it's intent.
But yea, I think OP's concern is still valid. The validation should probably be done against the context configured in the kubeconfig instead of relying on the output of a kind command.
Hi @monteiro-renato, @damsien, @reedjosh,
How can we validate the context or name to determine whether it’s a production environment for the user? This goes beyond what we can reasonably enforce. Moreover, if users execute any Makefile target against a production environment—such as make deploy, make install, or make undeploy—they may encounter serious issues.
Additionally, frameworks generally don’t handle this kind of validation. For example, tools like Terraform, Helm, etc., don’t verify whether a command targets a production environment. At some level, the user must understand the commands they’re running and the clusters they’re targeting.
On top of that:
- Cluster admins should safeguard production environments by applying RBAC to restrict access for developers.
- Developers should avoid configuring production context in their local dev environments an just leave it there. They should ensure that actions targeting production are separated from the development environment to prevent accidental operations. This is not a critical need specifically for e2e tests but applies to all possible actions that developers might execute using kubectl.
It was already discussed. We cannot validate the context configured by developers when they run the commands and say, "Ah, it's production, and it is not for the e2e tests," just as we do not perform such validation for any Makefile targets or features.
I hope that clarifies. I am looking forward to knowing if @reedjosh is or is not using a project with all changes related to this request. (as described in https://github.com/kubernetes-sigs/kubebuilder/issues/4391#issuecomment-2499282543) I really would appreciate it if @reedjosh could share this information with us.
Thank you.
Hey @camilamacedo86,
We could create a kind cluster based on the project's name and then use kind's get kubeconfig subcommand to generate the kubeconfig file to use in the e2e tests. The tests would set the KUBECONFIG env var at the start of the tests to point to the location of the kubeconfig file generated by kind.
Hi @camilamacedo86 , I agree that its pretty hard to prevent this & kubebuilder does not need to protect against all of this.
I think one improvement that is not too hard to achieve would be to extend the utils.Kubectl to also have the --context=value only set once and added to all kubectl calls.
I've personally ran my tests against the wrong context as it was not specified.
How did this happen? I've started a long running e2e test and then switched context in my shell. This cannot happen if the utils.Kubectl always specifies the --context .
With other clients (controller-runtime / e2e-testframework clients) this is not so much an issue as we create them before the tests with a hardcoded context.
Could you please let me know if you are using a project with the latest scaffold? I'm just asking because it was already addressed.
I'm sorry for the delayed response. I believe we were using 4.30 at the time. We started the project and saw the issue around the middle of Nov. And we were using controllergen v0.16.x.
Without digging into the code I think that the current method is still broken, and implementing protections and workarounds like Tilt does would be ideal.
Tilt checks k8s context names for the kind- prefix. If it's missing, it won't run.
The allow_k8s_contexts is then provided in case someone is running minikube or something else without the kind- prefix
This goes beyond what we can reasonably enforce. Moreover, if users execute any Makefile target against a production environment—such as make deploy, make install, or make undeploy—they may encounter serious issues.
I understand the thinking here, but make deploy does seem like something that would modify a cluster.
make test or make e2e for a tool that nearly every k8s operator builder uses (including those brand new to the ecosystem) does not make it obvious that if you're accidentally logged into a meaningful cluster, things will get wiped out.
Had a similar experience when running make test and make test-e2e, wasn't clear the context wasn't kind and the namespace was deleted.
Hi folks,
As discussed in Slack and another issue, adding --context=value and does not seem like a good approach
However, due to the complexities involved in avoiding this scenario and the fact that it still creates confusion for users, I was thinking about how to simplify it and solve the concerns raised. Additionally, since we now scaffold GitHub Actions to run E2E tests, we can improve this by:
- Removing the Golang code that installs CertManager from the E2E tests. (ref: https://github.com/kubernetes-sigs/kubebuilder/blob/master/testdata/project-v4/test/e2e/e2e_suite_test.go#L66-L81)
- Updating the E2E tests to validate the sample projects and docs tutorials for installing CertManager. (https://github.com/kubernetes-sigs/kubebuilder/blob/master/.github/workflows/test-e2e-book.yml and https://github.com/kubernetes-sigs/kubebuilder/blob/master/.github/workflows/test-e2e-samples.yml)
- Modifying the GitHub Actions scaffold to install CertManager explicitly before running tests. (ref: https://github.com/kubernetes-sigs/kubebuilder/blob/master/testdata/project-v4/.github/workflows/test-e2e.yml)
- Removing unnecessary complexities from the Makefile targets to simplify the test setup. (here: https://github.com/kubernetes-sigs/kubebuilder/blob/master/testdata/project-v4/Makefile#L64-L78)
This approach would make the test framework more predictable, reduce complexity, and align with best practices for managing dependencies in CI/CD pipelines as address all your concerns.
If someone would like to work on this one, please feel free !!!
@monteiro-renato, @damsien, @reedjosh, @reedjosh. WDYT? Please feel free to contribute if you wish.
I think that sounds great! Only final concern is that this also seems to install and thus remove metrics. Would that be subject to the same changes?
Thank you for your continued work in making kubebuilder an excellent project!
Hi @reedjosh,
Following the approach outlined in the comment above (https://github.com/kubernetes-sigs/kubebuilder/issues/4391#issuecomment-2655221151), external dependencies are no longer installed via the e2e tests. Instead, third-party dependencies are now handled through GitHub Actions, which seems to be the best approach.
This means that when you run make test-e2e, the project will be built, loaded with Kind, and the tests will be executed. Any required dependencies, such as CertManager or Prometheus, are managed within the GitHub Actions workflow.
Let me know if you have any questions and if the above answers your concerns.
As discussed in Slack and another issue, adding
--context=valueand does not seem like a good approach
Do you have a link to that? I would like to understand the reasoning.
The protection is really important to also have at runtime. E2e tests are often quite slow - so there is a risk someone switches the context. We for example already protect against many cases at startup (this one is easy).
As far as removing CertManager setup etc. - I personally like this setup although we so far setup everything before testing with kind/make. It would still mean that you potentially run our E2e tests in the wrong context (production!) which might do something undesirable on the resources your tests modify.
Hi @Sijoma,
Let’s go step by step to clarify the situation.
What is the problem?
When running make test-e2e, it installs Prometheus and CertManager.
If the current context is pointing to a cluster where these components are already installed, it may cause conflicts.
Why is this issue already resolved in the latest scaffold?
The latest scaffold has addressed this issue with the following changes:
✅ Prometheus is no longer installed.
✅ CertManager is only installed if:
-
Kind is running locally
Makefile#L63-L78 -
None of the CertManager CRDs are found in the cluster
e2e_suite_test.go#L71-L80 -
The
kind load imagecommand can be executed
e2e_suite_test.go#L63-L65
Given these updates, I don’t see how the original issue persists.
The scaffold now ensures that CertManager is only installed when necessary, and nokind-prefixconfiguration is required.
That’s why I suggested checking if you’re using the latest version.
Why don’t we set --context=value by default in the Makefile?
Following is a summary:
This has been discussed on other places like slack, Issue 4116 - Comment,Issue 4391 - Comment
1. How Do We Determine a Production Environment?
- There is no universal way to validate whether a cluster is a production environment.
- If we enforce a context check here, we would also need to check other Makefile targets (
make deploy,make install,make undeploy), which could also impact production environments. - The same accidental scenario can occur with ANY target. Why was this never requested for other targets? Why should we add it only for
make test-e2e?
2. Other Tools Do Not Enforce This
- Terraform, Helm, and Kubernetes (
kubectl) do NOT prevent users from running commands against production environments. - It is ultimately the user’s responsibility to ensure they are targeting the correct cluster.
3. Checking Context Does Not Solve the Issue
- Developers can still accidentally run tests against a production cluster by triggering e2e tests directly, without using
make test-e2e. - Using the Makefile is not mandatory. Many developers (including myself) trigger tests directly from an IDE.
- In CI workflows, the Makefile target is typically used, but developers often run tests locally in different ways. Therefore, the suggestion of
--context=valuedoes NOT solve the issue described here.
"This change would negatively impact users, as it would introduce unnecessary complexities into Kubebuilder that all users do not commonly require. It would also create inconsistent behaviour compared to other targets and deviate from standard practices followed by other tools. Additionally, it offers NO little real protection against misconfiguration and does not effectively address the underlying issue."
Alternative Approach – GitHub Actions for Managing Dependencies
If some users still think this is an issue, an alternative approach has already been proposed:
Ensure that GitHub Actions install the required dependencies instead.
📌 Discussed here: Issue 4391 - Comment
How This Works:
- External dependencies (e.g., CertManager, Prometheus) are no longer installed via e2e tests.
- Instead, they are managed within GitHub Actions.
- When running
make test-e2e:- The project is built
- It is loaded with Kind
- The tests are executed
- All required dependencies are handled by GitHub Actions, preventing unnecessary installations in local environments.
Pros of This Approach
- Simplifies the overall setup in the Makefile and Go test code
Cons of This Approach
- Make harder for devs run the e2e tests locally and debug they will need to have a pre-setup. Then alternatively, we would need to study the possibility to have a target setup-e2e test instead of install Prometheus via GitHub Actions
I hope this clarifies everything. Looking forward to your thoughts!
Thank you! 🚀
Hi @Sijoma, @reedjosh, @damsien, @jravetch, @monteiro-renato,
After further consideration, I’ve updated the pros and cons of the alternative solution proposed in the comment above: link to comment, where we discuss centralizing everything.
Honestly, I believe the best course of action here is:
-
Closing this issue as resolved – The problem originally raised can no longer occur in the latest versions.
-
Rejecting the proposed solution to modify the Makefile target to add context – This does not address the problem at all. As outlined earlier, E2E tests can be triggered via an IDE (which is often the case), meaning the same issue would still occur. Additionally, this approach would introduce unnecessary complexity, deviate from common standards, and not provide significant benefits to general users. However, end users are always free to modify their own scaffold if needed. (more info: https://github.com/kubernetes-sigs/kubebuilder/issues/4391#issuecomment-2656404852)
-
Considering an enhancement properly – If we want to improve this, I think the best approach is first to analyze the pros and cons carefully. So far, it seems that the best way is for a proposal (PEP) to be required. Referencing the PEP template would be helpful. Frequent back-and-forth changes to the scaffold can lead to a poor user experience—these changes only affect new projects, but users still need to update, and modifying it impacts workflows and pipeline configurations. Any potential change here, I think, would be the best approach; we have a design doc, consider all pros and cons, and ask for a community feedback process before moving forward.
So, I would suggest we close this issue based on the above considerations.
That said, I believe the best way forward would be:
- (A) Open a new issue with clear steps to reproduce the scenario based on the testdata samples (master branch), OR/AND
- (B) Open a PR proposing changes using the PEP template.
Let me know your thoughts!
Totally fine for me 👍 . It's a never ending problem. Also agree that adding the context to the Makefile does not help at all.
Rejecting the proposed solution to modify the Makefile target to add context
Just to clarify - I personally meant adding the --context on the Code by extending the utils.Kubectl to specify a context for the whole test suite. Same for the Kubernetes Context Config. This could then be loaded from the environment once on startup of the test suite.
This would also protect you when you use the IDE to run tests (like I do too).
We wrap the utils.Kubectl from Kubebuilder like so.
Thank you @camilamacedo86 !
We were recently bit by this again and this broke a cluster. Really frustrating! I created a different issue for this, #4116, which you closed without understanding what I was proposing. So I'll bring up the suggestion here as well:
Change end-to-end tests to be opt-in!
The default Makefile now checks for the kubeconfig to target the correct context, but if you run go test ./... it does not make this check. So this check needs to be moved into the Go code, or you just change the tests to be opt-in by using Go build tags:
//go:build e2e
// +build e2e
Then in your Makefile you would just add the -tags e2e flag to the command:
.PHONY: test-e2e # Run the e2e tests against a Kind k8s instance that is spun up.
test-e2e:
- go test ./test/e2e/ -v -ginkgo.v
+ go test ./test/e2e/ -v -tags e2e -ginkgo.v
This shows a way better signal of intent from the user. You either run it by doing make test-e2e where the kubeconfig context kind- prefix check is made, or you run it with go test ./... -tags e2e, where you would have to know this tag beforehand so you'll probably be way better informed about what it implies.
Not saying you shouldn't add --context flag to every kubectl command. That's still a very good idea. But one improvement does not exclude the other.
HI @applejag
We were recently bit by this again and this broke a cluster
It seems that you have not updated your project with the latest changes. I have no idea how that would be possible with the latest changes in place. Can you please ensure that you have your project with the latest changes? If so, can you please let us know what steps were performed and what issue was faced?
Suggestion to use tags
Since you clarified your suggestion at https://github.com/kubernetes-sigs/kubebuilder/issues/4116, I reopened this one. It has no relation with the scope of this issue. It is another suggestion right?
We were recently bit by this again and this broke a cluster
It seems that you have not updated your project with the latest changes. I have no idea how that would be possible with the latest changes in place. Can you please ensure that you have your project with the latest changes? If so, can you please let us know what steps were performed and what issue was faced?
An engineer ran go test ./... in our repo and the tests overwrote our certmanager and Prometheus CRDs in one of our clusters. But perhaps we've not updated far enough. Are you saying this will not happen anymore?
Suggestion to use tags
Since you clarified your suggestion at https://github.com/kubernetes-sigs/kubebuilder/issues/4116, I reopened this one. It has no relation with the scope of this issue. It is another suggestion right?
It is another suggestion, yes.
HI @applejag
An engineer ran go test ./... in our repo and the tests overwrote our certmanager and Prometheus CRDs in one of our clusters. But perhaps we've not updated far enough. Are you saying this will not happen anymore?
We no longer install Prometheus, so it's not possible to re-install it using the latest version of the scaffold.
Furthermore, using Kind in a production environment is not a viable option. As previously explained, the tests are specifically designed to run in Kind, and they will fail to load the image outside of that environment. You can find more context here: https://github.com/kubernetes-sigs/kubebuilder/issues/4391#issuecomment-2656404852
It seems your project may be missing some of the recent updates related to this change.
Please ensure that your project is upgraded to the latest scaffold. If you still encounter issues, feel free to open a new issue with a clear description of the steps you followed, so we can better assist.
Nice, thanks!
Sorry if I came off as hostile because I could feel I was a bit mad about the situation. Excellent that you've employed so many contingencies already.
I still believe the e2e build tag should be added, but that's for the other issue.
Will look into upgrading our scaffold promptly!
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Hey folks 👋
Thanks again for all the feedback and concerns raised — it’s been super helpful. Based on what was discussed, we've made several improvements:
✅ What's been addressed
-
Prometheus is no longer installed in E2E tests
It wasn’t actively used, so removing it eliminates ~50% of the original issues raised. -
Kind status checks before tests
We now check thatkindis installed and the cluster is up before running tests. -
Image loading for cert-manager
Before running tests and installing cert-manager, we attempt to load the required image intokind. Ifkindisn't running or we're targeting another environment, the tests will fail early. -
Smart cert-manager handling
We've added a utility that checks if cert-manager APIs are already applied. If so, we skip both install and uninstall steps. -
More flexible test runs
We've improved documentation and added anENV VARso you can run E2E tests without cert-manager if needed. -
E2E test tagging
E2E tests are now properly tagged, so runninggo test ./...won't accidentally trigger them.: https://github.com/kubernetes-sigs/kubebuilder/pull/4946
✅ Conclusion
We believe these updates fully address the concerns raised in this issue, so I’ll go ahead and close it.
However, if you check the master branch and still see something missing or have ideas for improvement, please feel free to:
- Open a new issue with the specific request or problem (ideally, one per issue so we can better track and address them), or
- Send us a PR — contributions are always welcome!
Your feedback and contributions mean a lot to the Kubebuilder maintainers and the community 💛
Thanks again!