sriov-network-device-plugin icon indicating copy to clipboard operation
sriov-network-device-plugin copied to clipboard

Introduce Process for API changes, experimental features and feature gates

Open adrianchiris opened this issue 4 years ago • 10 comments

As Support for new features are added to sriov-network-device-plugin, these features can:

  1. add and/or modify functionality of device plugin
  2. introduce new user-facing APIs (and possibly deprecate an old user-facing API)
  3. support new "south bound" facing APIs e.g kubelet, device-info-spec (this is similar to 1 IMO as the API is already defined and device plugin just implements a new functionality)

For these features it may be desired to:

  1. Introduce feature gates to enable-disable support and functionality in device plugin
  2. Mark these features (and their APIs) as experimental as they are not fully mature and their user-facing APIs may change in subsequent releases or removed altogether.
  3. Mark an existing API as deprecated and instruct users to use the new API (eventually removing the old API)

Ive created this issue to trigger an open discussion on the issues and see if we can come with a clean way to handle it.

adrianchiris avatar Jan 31 '21 11:01 adrianchiris

As Support for new features are added to sriov-network-device-plugin, these features can:

1. add and/or modify functionality of device plugin

2. introduce new user-facing APIs (and possibly deprecate an old user-facing API)

3. support new "south bound" facing APIs e.g kubelet, device-info-spec (this is similar to 1 IMO as the API is already defined and device plugin just implements a new functionality)

For these features it may be desired to:

1. Introduce feature gates to enable-disable support and functionality in device plugin

so this feature gate will be a new cli cmd option, and the corresponding new feature will only be evaluated when this feature gate is true?

2. Mark these features (and their APIs) as experimental as they are not fully mature and their user-facing APIs may change in subsequent releases or removed altogether.

Is this to mark it as experimental or deprecated in the doc or in the code (e.g. warning message that user can be notified when configuring the feature)?

Since we are targeting new API process, do we really need featureGate? or just use the API created for that feature for enablement or disablement (e.g. if API is defined in cli or configMap, then it is enabled by user, vice versa)?

zshi-redhat avatar Feb 01 '21 02:02 zshi-redhat

so this feature gate will be a new cli cmd option, and the corresponding new feature will only be evaluated when this feature gate is true?

Yes, thats what i had in mind. similarly to what is done in K8s feature gates. We can have the feature gates as part of configMap as well (eg a new featureGates field) if we want and cli flags will override those.

Is this to mark it as experimental or deprecated in the doc or in the code (e.g. warning message that user can be notified when configuring the feature)?

Im thinking the following:

maintain a table in docs for features which states:

  • Feature name
  • Feature description
  • When it was introduced
  • its stage (alpha, beta, ga with similar semantics as k8s feature gates stages)
  • whether or not this feature is deprecated, with the version it was deprecated on.

For deprecated features we shall also have a deprecation warning emitted in log. a deprecated feature will be removed 2 releases or 6 months after it was marked as deprecated (the later of the two)

Since we are targeting new API process, do we really need featureGate? or just use the API created for that feature for enablement or disablement (e.g. if API is defined in cli or configMap, then it is enabled by user, vice versa)?

A new feature may or may not have a user facing API, hence i think we should have the separation.

adrianchiris avatar Feb 01 '21 12:02 adrianchiris

@adrianchiris I think this needs careful thought before introducing - How much work involving for introducing this. Do you know of any example project that implements feature gates like Kubernetes? Id like to understand the code changes needed to turn off and off existing features. Also, if we want to go down this route and we need a consensus on depreciating a feature, can we agree on full acceptance from all maintainers?

martinkennelly avatar Feb 03 '21 13:02 martinkennelly

Do you know of any example project that implements feature gates like Kubernetes? Id like to understand the code changes needed to turn off and off existing features.

Kubernetes is one example :) . ovn-kubernetes has some knobs that disables/enables functionality which is introduced in new features as well as some form of feature gate: https://github.com/ovn-org/ovn-kubernetes/blob/0a2aad51333400e28fbf51c1017b3064d61186cc/go-controller/pkg/config/config.go#L218

Also, if we want to go down this route and we need a consensus on depreciating a feature, can we agree on full acceptance from all maintainers?

My 2-cents here:

Alpha feature - removal should be by majority of maintainers (not deprecation here as it may be removed at the subsequent release). Beta feature - deprecation should have full acceptance from all maintainers. GA feature - deprecation should have full acceptance from all maintainers, although i suspect these will be deprecated if the feature is irrelevant, has alternatives and next to none consumers.

Not every feature will require a feature gate. As an example:

  • Adding a new selector, should we add a feature gate to this selector ? IMO no as it extends an existing API and is not used if not specified
  • Adding Support to provide NUMA information to kubelet ? IMO yes as it satisfies a new kubelet API and affects pods scheduling. (and with PR #320 ) it will also introduce new APIs for the user.

so it may be on a per-enhancement basis if a feature gate is to be added.

Downside of that is that the list of features tracked in docs and their state may be disjoint from the feature gates, unless we say that we track a state (alpha, beta, GA, deprecated) of a feature only if it has a feature gate. And a feature will have a feature gate if it is "substantial" . Evaluated per-feature depending of the impact on the project (API, behavioral)

adrianchiris avatar Feb 08 '21 09:02 adrianchiris

Overall, I think this brings a lot of benefits

Downside of that is that the list of features tracked in docs and their state may be disjoint from the feature gates, unless we say that we track a state (alpha, beta, GA, deprecated) of a feature only if it has a feature gate. And a feature will have a feature gate if it is "substantial" . Evaluated per-feature depending of the impact on the project (API, behavioral)

I think just tracking the features for which there is a feature gate is reasonable. Other features should be documented elsewhere anyhow.

Some thoughts implementation-wise: I guess this should mean we add a global Config struct that is accessible from all over the codebase? Should it live in it's own sub-package to avoid adding inter-package dependencies or could we reuse e,g: factory? What kind of logic will end up in such package? I'm thinking at least the struct initialization based on cli and configMap options (including solving potential feature interdependency). But this should be devicetype-agnostic, meaning the activation of a feature cannot depend on, say, the network-specific capabilities of the node. I cannot think of any feature that might need this, but just saying it out loud.

Also, the current implementation uses []ResourceConfig to unmarshall the configMap content, we'll have to add a global config space, right? Or are you thinking of making the feature gates per-pool?

amorenoz avatar Mar 01 '21 09:03 amorenoz

@amorenoz i think we have the same idea about implementation approach

exact details can be discussed over the PR if we have a general agreement on adding support for feature gates.

implementation points:

  • global Config structs in its own sub-package to avoid storing a reference in all relevant internal structs. im thinking to avoid use factory.
  • Initialized globally with defaults, overriden by configMap and cli params (in that order)
  • devicetype/hardware/node-config agnostic, essentially a feature gate on/off value is derrived from:
    • hardcoded defaults
    • configMap
    • Cli params
    • other feature gates Dependencies
  • In regards to ConfigMap, im thinking to have a new featureGates top level member which will affect device plugin functionallity in a global manner. If a certain feature has a per resource knobs its fine, but the feature is enabled/disabled globally.

adrianchiris avatar Mar 01 '21 11:03 adrianchiris

  • global Config structs in its own sub-package to avoid storing a reference in all relevant internal structs

@adrianchiris Does this mean a singleton basically?

Also, do you think that this new config should be added to the existing configMap, or should it be a separate configMap (e.g. renaming existing configMap.yml to, let's say, discoveryConfig.yml and creating addtional featuresConfig.yml?

Additionally I am not sure about having both Config and featureGates. Couldn't all the relevant information be stored in featureGates itself? Just to avoid duplicating the values.

ipatrykx avatar Apr 07 '21 09:04 ipatrykx

@adrianchiris Does this mean a singleton basically?

Yep

Also, do you think that this new config should be added to the existing configMap, or should it be a separate configMap (e.g. > renaming existing configMap.yml to, let's say, discoveryConfig.yml and creating addtional featuresConfig.yml?

I think it should be in the same Map, later on we can discuss if we need to separate the resource configuration from device plugin configuration.

Additionally I am not sure about having both Config and featureGates. Couldn't all the relevant information be stored in featureGates itself? Just to avoid duplicating the values.

Well a featureGate enables or disables a feature. A feature may or may not be associated with a set of Config parameters. so i dont think you can roll all of it into featureGates.

adrianchiris avatar Apr 07 '21 09:04 adrianchiris

Just created a PR on this for initial discussion.

ipatrykx avatar Apr 12 '21 11:04 ipatrykx

@adrianchiris @zshi-redhat @amorenoz @martinkennelly

I was supposed to present this today, but we ran out of time. So, to speed things up, I am attaching this simple presentation here.

SRIOV DP featuregates.pdf

Also, the concerns that are specified in the last slide:

  • FeatureGate configuration precedence - should CLI args override ConfigMap parameters? When DP is running as a pod it seems to be easier to change ConfigMap than CLI args.

  • Deprecation message – should emitted deprecation message be generic (e.g. WARNING: feature ABC is deprecated), or should be feature-specific (provided in config for example)?

  • Should there be special idioms like e.g. ‘allAlphaFeatures’ that would disable/enable groups of features (in this case disable/anable all alpha features)?

  • Should there be a way to block the feature disabling/enabling (e.g. there is a GA feature that user should not be able to disable)?

ipatrykx avatar Apr 12 '21 15:04 ipatrykx