api icon indicating copy to clipboard operation
api copied to clipboard

Support custom list of services to be added to /etc/hosts in cluster DNS operator - RFE-4145

Open t-cas opened this issue 6 months ago • 17 comments

prerequisite for PR https://github.com/openshift/cluster-dns-operator/pull/441 due to new field introduced in DNS CRD

t-cas avatar Aug 04 '25 13:08 t-cas

Hello @t-cas! Some important instructions when contributing to openshift/api: API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

openshift-ci[bot] avatar Aug 04 '25 13:08 openshift-ci[bot]

Hi @t-cas. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci[bot] avatar Aug 04 '25 13:08 openshift-ci[bot]

Do we have an EP explaining what this new feature is? As @everettraven mentions, this should be behind a feature gate as a new feature extending an existing API

JoelSpeed avatar Aug 06 '25 15:08 JoelSpeed

Do we have an EP explaining what this new feature is? As @everettraven mentions, this should be behind a feature gate as a new feature extending an existing API

@JoelSpeed I just added more details about the proposal as part of RFE https://issues.redhat.com/browse/RFE-4145, tell me if you have any questions or need more details.

Also, thanks for the suggestion regarding the feature gate but I’d like to clarify the rationale behind not introducing one in this case. This enhancement adds a new optional field (nodeServices) to the existing DNS.operator.openshift.io API. If this field is not set, the behavior remains exactly as it is today, with the default hardcoded service (image-registry.openshift-image-registry.svc) being used. In that sense, the current implementation already acts as a natural feature gate: if the field is unset, the feature is effectively disabled. The goal of this change is to make an existing behavior configurable, not to introduce a fundamentally new capability. It extends the current functionality of injecting DNS service entries into /etc/hosts at the node level, which is already part of the operator’s core responsibilities. That said, I’m open to understanding if there’s a specific risk or compatibility concern that a formal feature gate would help mitigate. Is there a scenario you foresee where this optional field could impact stability, upgrade paths, or API guarantees?

t-cas avatar Aug 07 '25 12:08 t-cas

@t-cas The purpose of the feature-gate process is to ensure that new features:

  • Have functionality implemented before promotion to GA and thus being supported
  • Have sufficient testing that reports into component readiness so we can detect regressions

More specifically for your feature, you mention:

The goal of this change is to make an existing behavior configurable, not to introduce a fundamentally new capability. It extends the current functionality of injecting DNS service entries into /etc/hosts at the node level, which is already part of the operator’s core responsibilities. That said, I’m open to understanding if there’s a specific risk or compatibility concern that a formal feature gate would help mitigate. Is there a scenario you foresee where this optional field could impact stability, upgrade paths, or API guarantees?

This is the purpose of an enhancement proposal. It allows us to better understand the motivation of the feature, proposed implementation and evaluate the feature for some of those potential concerns. If this wasn't previously configurable, how does making it configurable impact assumptions users and other systems on an OpenShift cluster may have made about the existing behavior?

If the linked RFE, you mentioned The node-resolver daemonset is updated dynamically when the custom resource changes. - what does this mean? How do we rollout the node-resolver? Is there any chance of a rollout causing a disruption on the cluster?

These are all the kinds of questions I would expect to be answered by an EP.

everettraven avatar Aug 07 '25 14:08 everettraven

@everettraven Thanks for the detailed explanation, this helps clarify the broader purpose of the feature gate process, and I fully appreciate its value in ensuring stability, readiness, and supportability of new features.

In this specific case, I’m trying to assess where the feature gate adds value. The node-resolver daemonset’s role remains the same: injecting DNS service entries into /etc/hosts at node level. The only change here is making the list of services configurable via a new optional field. If nodeServices is not set, the behavior remains exactly as it is today. In that sense, the field itself acts as a natural gate, if you don’t use it, the feature is effectively disabled.

Regarding the rollout: the daemonset may be restarted when the custom resource changes, but this follows standard Kubernetes rolling update daemonset behavior. The node-resolver does not perform any live processing that could be impacted; it simply sets up /etc/hosts on the node where it runs. So in practice, restarting it has no observable impact on cluster operations.

Finally, I initially didn’t expect this small change to require a dedicated enhancement proposal, since it doesn’t really alter the behavior of the node-resolver daemonset, what do you think ?

I’ll make sure to include this clarification in linked RFE. If there are specific risks or assumptions about the existing behavior that you think could be affected, I’d be happy to explore and address them.

t-cas avatar Aug 07 '25 14:08 t-cas

@JoelSpeed thanks for all the feedback, PR has been updated to integrate all your comments

Please tell me for the next steps how should I proceed with that change

t-cas avatar Aug 08 '25 11:08 t-cas

/assign

Miciah avatar Aug 13 '25 14:08 Miciah

/assign @alebedev87

candita avatar Aug 13 '25 15:08 candita

/ok-to-test

JoelSpeed avatar Sep 24 '25 10:09 JoelSpeed

@JoelSpeed thanks for running test, I see they are failing but because it expects to have omitempty tag on Namespace field while this field is mandatory and cannot be empty, so seems pretty useless to me in that case to add this tag. For me, omitempty tag was only for optional fields.

Here is the exact error message:

operator/v1/types_dns.go:200:2: requiredfields: field Namespace should have the omitempty tag. (kubeapilinter)
	Namespace string `json:"namespace"`
	^

How should I proceed ?

t-cas avatar Sep 26 '25 09:09 t-cas

@t-cas We are in the process of updating some of our API conventions here and the linter is attempting to enforce a new one where a field where the empty string ("") is not a valid value should never be included in the serialized output.

I don't think it is a huge deal here, but we are trying to follow the output of the linter where possible for consistency. I'd recommend applying the changes suggested by the linter.

everettraven avatar Sep 29 '25 13:09 everettraven

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign everettraven for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Sep 30 '25 12:09 openshift-ci[bot]

@everettraven thanks for the feedback, PR updated

t-cas avatar Sep 30 '25 12:09 t-cas

/retest

t-cas avatar Oct 01 '25 13:10 t-cas

@t-cas: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Oct 01 '25 14:10 openshift-ci[bot]

IMO, outside of the couple of comments about the network team opinion, API here LGTM

JoelSpeed avatar Oct 03 '25 11:10 JoelSpeed

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Jan 26 '26 01:01 openshift-bot

[!IMPORTANT]

Review skipped

Auto reviews are limited based on label configuration.

:no_entry_sign: Review skipped — only excluded labels are configured. (1)
  • do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Jan 26 '26 01:01 coderabbitai[bot]

/remove-lifecycle stale

t-cas avatar Jan 26 '26 07:01 t-cas