Support custom list of services to be added to /etc/hosts in cluster DNS operator - RFE-4145
prerequisite for PR https://github.com/openshift/cluster-dns-operator/pull/441 due to new field introduced in DNS CRD
Hello @t-cas! Some important instructions when contributing to openshift/api: API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.
Hi @t-cas. Thanks for your PR.
I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
Do we have an EP explaining what this new feature is? As @everettraven mentions, this should be behind a feature gate as a new feature extending an existing API
Do we have an EP explaining what this new feature is? As @everettraven mentions, this should be behind a feature gate as a new feature extending an existing API
@JoelSpeed I just added more details about the proposal as part of RFE https://issues.redhat.com/browse/RFE-4145, tell me if you have any questions or need more details.
Also, thanks for the suggestion regarding the feature gate but I’d like to clarify the rationale behind not introducing one in this case. This enhancement adds a new optional field (nodeServices) to the existing DNS.operator.openshift.io API. If this field is not set, the behavior remains exactly as it is today, with the default hardcoded service (image-registry.openshift-image-registry.svc) being used. In that sense, the current implementation already acts as a natural feature gate: if the field is unset, the feature is effectively disabled. The goal of this change is to make an existing behavior configurable, not to introduce a fundamentally new capability. It extends the current functionality of injecting DNS service entries into /etc/hosts at the node level, which is already part of the operator’s core responsibilities. That said, I’m open to understanding if there’s a specific risk or compatibility concern that a formal feature gate would help mitigate. Is there a scenario you foresee where this optional field could impact stability, upgrade paths, or API guarantees?
@t-cas The purpose of the feature-gate process is to ensure that new features:
- Have functionality implemented before promotion to GA and thus being supported
- Have sufficient testing that reports into component readiness so we can detect regressions
More specifically for your feature, you mention:
The goal of this change is to make an existing behavior configurable, not to introduce a fundamentally new capability. It extends the current functionality of injecting DNS service entries into /etc/hosts at the node level, which is already part of the operator’s core responsibilities. That said, I’m open to understanding if there’s a specific risk or compatibility concern that a formal feature gate would help mitigate. Is there a scenario you foresee where this optional field could impact stability, upgrade paths, or API guarantees?
This is the purpose of an enhancement proposal. It allows us to better understand the motivation of the feature, proposed implementation and evaluate the feature for some of those potential concerns. If this wasn't previously configurable, how does making it configurable impact assumptions users and other systems on an OpenShift cluster may have made about the existing behavior?
If the linked RFE, you mentioned The node-resolver daemonset is updated dynamically when the custom resource changes. - what does this mean? How do we rollout the node-resolver? Is there any chance of a rollout causing a disruption on the cluster?
These are all the kinds of questions I would expect to be answered by an EP.
@everettraven Thanks for the detailed explanation, this helps clarify the broader purpose of the feature gate process, and I fully appreciate its value in ensuring stability, readiness, and supportability of new features.
In this specific case, I’m trying to assess where the feature gate adds value. The node-resolver daemonset’s role remains the same: injecting DNS service entries into /etc/hosts at node level. The only change here is making the list of services configurable via a new optional field. If nodeServices is not set, the behavior remains exactly as it is today. In that sense, the field itself acts as a natural gate, if you don’t use it, the feature is effectively disabled.
Regarding the rollout: the daemonset may be restarted when the custom resource changes, but this follows standard Kubernetes rolling update daemonset behavior. The node-resolver does not perform any live processing that could be impacted; it simply sets up /etc/hosts on the node where it runs. So in practice, restarting it has no observable impact on cluster operations.
Finally, I initially didn’t expect this small change to require a dedicated enhancement proposal, since it doesn’t really alter the behavior of the node-resolver daemonset, what do you think ?
I’ll make sure to include this clarification in linked RFE. If there are specific risks or assumptions about the existing behavior that you think could be affected, I’d be happy to explore and address them.
@JoelSpeed thanks for all the feedback, PR has been updated to integrate all your comments
Please tell me for the next steps how should I proceed with that change
/assign
/assign @alebedev87
/ok-to-test
@JoelSpeed thanks for running test, I see they are failing but because it expects to have omitempty tag on Namespace field while this field is mandatory and cannot be empty, so seems pretty useless to me in that case to add this tag.
For me, omitempty tag was only for optional fields.
Here is the exact error message:
operator/v1/types_dns.go:200:2: requiredfields: field Namespace should have the omitempty tag. (kubeapilinter)
Namespace string `json:"namespace"`
^
How should I proceed ?
@t-cas We are in the process of updating some of our API conventions here and the linter is attempting to enforce a new one where a field where the empty string ("") is not a valid value should never be included in the serialized output.
I don't think it is a huge deal here, but we are trying to follow the output of the linter where possible for consistency. I'd recommend applying the changes suggested by the linter.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign everettraven for approval. For more information see the Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@everettraven thanks for the feedback, PR updated
/retest
@t-cas: all tests passed!
Full PR test history. Your PR dashboard.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.
IMO, outside of the couple of comments about the network team opinion, API here LGTM
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle stale
[!IMPORTANT]
Review skipped
Auto reviews are limited based on label configuration.
:no_entry_sign: Review skipped — only excluded labels are configured. (1)
- do-not-merge/work-in-progress
Please check the settings in the CodeRabbit UI or the
.coderabbit.yamlfile in this repository. To trigger a single review, invoke the@coderabbitai reviewcommand.You can disable this status message by setting the
reviews.review_statustofalsein the CodeRabbit configuration file.
Comment @coderabbitai help to get the list of available commands and usage tips.
/remove-lifecycle stale