kubernetes-nmstate icon indicating copy to clipboard operation
kubernetes-nmstate copied to clipboard

RFE: Configure static IPs on a pool of nodes with a single Policy

Open nabbas-ca opened this issue 3 years ago • 28 comments

What happened: When trying to set a static ip to an extra interface (in my case, non-routable ip, for internal network), i had to create a different policy for every node. Basically, using a node selector for each node, and have a policy per IP to be assigned. This translates to multiple nncp objects (order of x), and so many nnce objects (order of x^2). All of this while having exactly the same configuration, except for the IP. What you expected to happen: I would like to have a way to have only 1 policy , for multiple IPs on all the nodes. Maybe using an IP pool, or a formula, or any other way. This way, there is no need to duplicate configuration yamls, and also, makes nnce objects more scalable. How to reproduce it (as minimally and precisely as possible): sample nncp yaml:

apiVersion: nmstate.io/v1beta1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: bond1-policy
spec:
nodeSelector: 
    kubernetes.io/hostname: compute-0.k8s.com 
  desiredState:
    interfaces:
    - name: bond1 
      description: Bond enslaving ens3f0 and ens3f1 
      type: bond 
      state: up 
      ipv4:
        address:
        - ip: 169.100.100.15
          prefix-length: 24
        dhcp: false 
        enabled: true 
      link-aggregation:
        mode: 802.3ad 
        options:
          miimon: '140' 
        slaves: 
        - ens3f0
        - ens3f1
      mtu: 1500 
  • Create multiple yamls like above, for different nodes and different IPs, with same configuration. apply all of them
  • oc get nncp , you should get x number of objects
  • oc get nnce , you should get x^2 objects, where only x are successfully configured, the rest are mismatched because of the selector.

Anything else we need to know?:

Environment: This is within Openshift 4.7 on baremetal environment.

nabbas-ca avatar Apr 15 '21 09:04 nabbas-ca

Hello. This is indeed a valid use-case we've been looking into for a while. The API should be ready for this, this is one of the reasons we have desiredState in each individual NodeNetworkConfigurationEnactment - anticipating they will differ.

That being said, with exception of some API drafts, there was not work done on this front yet. Despite we anticipate this as a real use-case there were not many user requests for this.

It was discussed whether this API should be a part of core nmstate, but the answer is negative https://bugzilla.redhat.com/show_bug.cgi?id=1934514. This new API is, unlike the rest of nmstate, stateful.

The current direction is that we should introduce a new library as a part of the nmstate project that would abstract this stateful configuration. This would also help us assure backwards compatibility of the API. Once we have such a library, integrating it to kubernetes-nmstate should not be too difficult.

Long story short, this is on our radar, but without a specific date attached to it.

phoracek avatar Apr 15 '21 14:04 phoracek

thanks @phoracek . In the meantime, do you foresee scalability issues with this? if there are 500 or 1000 nnce objects on a cluster?

nabbas-ca avatar Apr 18 '21 06:04 nabbas-ca

It may be. I don't know about this specific case, but we have seen scalability issues dependent on number of Nodes in the cluster. We are currently working on addressing that through https://github.com/kubernetes-sigs/controller-runtime/pull/1435. There is only one way to find out I'm afraid.

phoracek avatar Apr 22 '21 14:04 phoracek

I think I am in the same kind of similar boat, need to assign an non-routable IP to each node. As it mentioned already, nnce are in huge number.

vishnoisuresh avatar Apr 22 '21 17:04 vishnoisuresh

Templating may be quite difficult to implement. On the other hand, we can partially relieve you issue by changing the NNCE logic, so it gets created only per a matching node. I'm not sure if its doable, since lot of our logic is based around it being available, but it would be worth a research.

cc @qinqon @rhrazdil

phoracek avatar Apr 26 '21 16:04 phoracek

On the other hand, we can partially relieve you issue by changing the NNCE logic, so it gets created only per a matching node.

@phoracek Yea I think, It would be also convenient, as it will remove visual noise of unnecessary NNCE resources.

vishnoisuresh avatar Apr 27 '21 11:04 vishnoisuresh

@phoracek , another idea is to have a have an ip pool object, where the policy will assign ips from that pool there. fails if there are no more ips. Something along these lines will make it work for us, as we don't really care about which ips, as long they come from a pool.

Another idea, make the ip address like a formula/function. 169.0.0.x+10 or something like that.

nabbas-ca avatar Apr 27 '21 12:04 nabbas-ca

On the other hand, we can partially relieve you issue by changing the NNCE logic, so it gets created only per a matching node.

@phoracek Yea I think, It would be also convenient, as it will remove visual noise of unnecessary NNCE resources.

I am going to prepare a PR for that.

qinqon avatar Apr 29 '21 12:04 qinqon

@nabbas-ca I would be against turning knmstate into a cluster IP management tool. The scope of the project is to provide a simple Kubernetes-native way to configure networking on cluster nodes. This configuration is "static" and nodes are not depending on each other. Introducing a non-naive IPAM support would go way beyond this.

Once @qinqon gets rid of those idle NNCEs, hopefully the resource consumption will become acceptable. knmstate API should allow third-party operator to use it for IPAM - one could create an operator that upon a Node creation creates a NNCP with unique IP assigned to the node.

phoracek avatar Apr 29 '21 12:04 phoracek

@phoracek , I understand your concerns. But from my perspective, all these NNCP objects are identical , except for the ip. At least make this ip specification somehow extendable, so that I don't have that many policies, basically with the exact same configuration. Editing those will be painful as well.

Just trying to give the users perspective here. It is a valid use case to try to use static ips, and Kubernetes nmstate supports it, but not optimally, allowing possible errors by the user, with not an optimal user experience. Compare that to the dhcp experience, it is 1 policy, with many enactments, works really well.

We even considered using an artificial dhcp on the switches just to give the ips, but it seems adding an external server just for this is opening another can of worms. I would rather have many nncp objects, automated , then adding a technically unnecessary external service.

nabbas-ca avatar Apr 29 '21 15:04 nabbas-ca

@phoracek , I understand your concerns. But from my perspective, all these NNCP objects are identical , except for the ip. At least make this ip specification somehow extendable, so that I don't have that many policies, basically with the exact same configuration. Editing those will be painful as well.

As commented above, having templating support is certainly something we'd like to have. Would you be open to file an RFE describing user cases for this to outline the requirements? Note that making it should be generic enough, so we don't need to manage any specific attribute, like IP.

Just trying to give the users perspective here. It is a valid use case to try to use static ips, and Kubernetes nmstate supports it, but not optimally, allowing possible errors by the user, with not an optimal user experience. Compare that to the dhcp experience, it is 1 policy, with many enactments, works really well.

Would templating make it easier to handle? I'm not convinced that kubernetes-nmstate should support cluster-wide configuration logic - understand the payload of desiredState and make it consistent. Moreover, this should be easy to implement as a dedicated operator interfacing with kubernetes-nmstate API. You could treat it as a PoC. If it proves easy to correctly handle, we could discuss whether it should be integrated to kubernetes-nmstate.

phoracek avatar Apr 30 '21 13:04 phoracek

@phoracek , I understand your concerns. But from my perspective, all these NNCP objects are identical , except for the ip. At least make this ip specification somehow extendable, so that I don't have that many policies, basically with the exact same configuration. Editing those will be painful as well.

As commented above, having templating support is certainly something we'd like to have. Would you be open to file an RFE describing user cases for this to outline the requirements? Note that making it should be generic enough, so we don't need to manage any specific attribute, like IP. @phoracek , what do you exactly mean by templating support? I'm open to filing an RFE, but i can only think for my use case, not sure how to make it generic.

Just trying to give the users perspective here. It is a valid use case to try to use static ips, and Kubernetes nmstate supports it, but not optimally, allowing possible errors by the user, with not an optimal user experience. Compare that to the dhcp experience, it is 1 policy, with many enactments, works really well.

Would templating make it easier to handle? I'm not convinced that kubernetes-nmstate should support cluster-wide configuration logic - understand the payload of desiredState and make it consistent. Moreover, this should be easy to implement as a dedicated operator interfacing with kubernetes-nmstate API. You could treat it as a PoC. If it proves easy to correctly handle, we could discuss whether it should be integrated to kubernetes-nmstate.

I'm open to writing an operator for this. This looks like a good idea.

nabbas-ca avatar Apr 30 '21 15:04 nabbas-ca

Wonderful! I would be interested to hear how that goes. And of course, if you'd have issues using the API or any other questions, don't hesitate to reach out.

phoracek avatar Apr 30 '21 15:04 phoracek

@phoracek , I will keep you updated of course. Can you point me out to a good GO example of using Kubernetes nmstate API? that will be a helpful good start.

As for the RFE, I would like your input on how to make it generic. Also, what's the process of filing an RFE?

nabbas-ca avatar Apr 30 '21 16:04 nabbas-ca

The only example of the API usage I know of are our e2e tests, you may want to start there. It uses controller-runtime client and should be pretty straightforward.

For RFE, let's start with a GitHub issue, describe user stories connected to the feature, what you want to accomplish, what are the non-goals, what are risks of this, offer draft of the API (optional), describe how it will affect compatibility with previous API versions and whether there is something we need to do during upgrades. Consider what happens on a cluster that is in a middle of an upgrade (part of it has the new feature, part does not).

More information you gather the better. However, note that getting into implementation details would be preemptive and may sway the discussion.

Once we have that draft, we can discuss it below the Issue. If we feel that the comment section is not good enough anymore, we could move the proposal to a Google document or hackmd.

These are just suggestions, there is no process set in stone.

phoracek avatar May 03 '21 08:05 phoracek

@nabbas-ca did you get a chance to look into this at all?

BTW we started looking into the feature to copy NIC's IP to a bridge - to allow single policy per cluster

phoracek avatar Jul 14 '21 10:07 phoracek

@phoracek : what did you exactly mean when you mentioned templating? I might think of using node annotations to store static IPs and allow some go templating in the nncp interfaces map. But I am not sure if it's the same as your idea of templating...

brutus333 avatar Aug 12 '21 12:08 brutus333

@brutus333 that is awesome to hear as taking data from annotations was something we were planning to. The "templating" mechanism should allow us to reference data from state/annotations and to automatically move IPs between interfaces too. We are in an early phase of designing this feature, so there are still multiple options being considered.

cc @EdDev @AlonaKaplan

phoracek avatar Aug 12 '21 12:08 phoracek

I would like to see a policy that lets me express "allocate a static address within the specified subnet, unique in this cluster, preferring the one already used".

...
  desiredState:
    interfaces:
    - name: bond1 
      type: bond 
      state: up 
      ipv4:
        address:
        - ip: 169.100.100.00/28   # << suggested notation 
          prefix-length: 24
        dhcp: false 
        enabled: true 

but something less smart such as "take the address from a specific node label" may be helpful too.

...
  desiredState:
    interfaces:
    - name: bond1 
      type: bond 
      state: up 
      ipv4:
        address:
        - ip: "{{ node.label.bond-ipv4-address }}"   # << suggested notation 
          prefix-length: 24
        dhcp: false 
        enabled: true 

dankenigsberg avatar Dec 20 '21 11:12 dankenigsberg

I would like to see a policy that lets me express "allocate a static address within the specified subnet, unique in this cluster, preferring the one already used".

This will push us in the business of managing pools and allocating addresses. I do not think we are looking to go into that direction. A softer option could be to create the interfaces from a range a the time of resolving the expression into enactments. But this will not check if other such addresses exists in other policies, enactments or nodes in general.

but something less smart such as "take the address from a specific node label" may be helpful too.

This one I like best. Someone else picks the address and k-nmstate just applies it on the configuration.

Both seem to solve a need, the question which one is more accurate to the scenario need. Or maybe which one is less risky to mess up the whole setup.

EdDev avatar Dec 20 '21 11:12 EdDev

I would like to see a policy that lets me express "allocate a static address within the specified subnet, unique in this cluster, preferring the one already used".

This will push us in the business of managing pools and allocating addresses. I do not think we are looking to go into that direction.

This is what customers wants; knmstate may delegate this to a different component

A softer option could be to create the interfaces from a range a the time of resolving the expression into enactments. But this will not check if other such addresses exists in other policies, enactments or nodes in general.

I'm fine with this as long as

  • it works for nodes that are added after the fact, and recycles IPs of nodes that are removed
  • A policy fails with a clear error if one of the IP ranges in it intersects with another range.

but something less smart such as "take the address from a specific node label" may be helpful too.

This one I like best. Someone else picks the address and k-nmstate just applies it on the configuration.

This can work for me - as long as I can configure the "something else" with something like

nodeAddress:
  nodelabel1: 192.168.0.0/23
  nodelabel1: 192.168.2.0/23

and it takes care to label each node with two unique addresses.

Both seem to solve a need, the question which one is more accurate to the scenario need. Or maybe which one is less risky to mess up the whole setup.

Another question is which of the two is easier to understand and use.

dankenigsberg avatar Dec 20 '21 13:12 dankenigsberg

Have being talking with @maiqueb, he is in contact with whereabouts team and looks like creating an API for their IPPool feature is something they would like to have, we are going to meet with them and present the knmstate user-case.

qinqon avatar Jan 25 '22 14:01 qinqon

I want to put one restraint on whatever is chosen here as a valid solution:

  • k-nmstate should not depend on any other tool directly. It should surely not query some CRD or call it using gRPC or alike. I think it is fine for some tool/service to allocate addresses in a format that k-nmstate can use indirectly (e.g. create a config-map, then k-nmstate taking that and overriding the desired state at the enactment).

EdDev avatar Jan 26 '22 06:01 EdDev

I want to put one restraint on whatever is chosen here as a valid solution:

  • k-nmstate should not depend on any other tool directly. It should surely not query some CRD or call it using gRPC or alike. I think it is fine for some tool/service to allocate addresses in a format that k-nmstate can use indirectly (e.g. create a config-map, then k-nmstate taking that and overriding the desired state at the enactment).

You mean consume directly the configmap from a tool like whereabouts ? another option would be to consume whereabouts as a library to re-use the reserve/deallocate logic.

qinqon avatar Jan 26 '22 08:01 qinqon

I want to put one restraint on whatever is chosen here as a valid solution:

  • k-nmstate should not depend on any other tool directly. It should surely not query some CRD or call it using gRPC or alike. I think it is fine for some tool/service to allocate addresses in a format that k-nmstate can use indirectly (e.g. create a config-map, then k-nmstate taking that and overriding the desired state at the enactment).

Makes sense to not depend on whereabouts for IPAM, even if it would be the single IPAM provider for the time being.

Correct me if I'm wrong @EdDev, but an IPAM interface - where whereabouts is only one type of provider - would address your concerns. How it would look like (i.e. which communication medium, what a request/reply look like, etc) are to be designed.

maiqueb avatar Jan 26 '22 09:01 maiqueb

You mean consume directly the configmap from a tool like whereabouts ? another option would be to consume whereabouts as a library to re-use the reserve/deallocate logic.

I mean that I do not want k-nmsate to know about whereabouts, or any other solution. It should be completely decoupled from the provider of this functionality. Either that service (e.g. whereabouts) will annotate something, add a configmap or whatever else is possible so k-nmstate can consume it as input, or let some 3rd controller do that (connect the dots).

EdDev avatar Jan 26 '22 09:01 EdDev

Correct me if I'm wrong @EdDev, but an IPAM interface - where whereabouts is only one type of provider - would address your concerns. How it would look like (i.e. which communication medium, what a request/reply look like, etc) are to be designed.

IPAM in the context we are discussin is one such "interface" I guess. So I think we are in sync here.

EdDev avatar Jan 26 '22 09:01 EdDev

You mean consume directly the configmap from a tool like whereabouts ? another option would be to consume whereabouts as a library to re-use the reserve/deallocate logic.

I mean that I do not want k-nmsate to know about whereabouts, or any other solution. It should be completely decoupled from the provider of this functionality. Either that service (e.g. whereabouts) will annotate something, add a configmap or whatever else is possible so k-nmstate can consume it as input, or let some 3rd controller do that (connect the dots).

I misunderstood, creating some IPAM client interface with implementatin selected at runtime would be enough ?

qinqon avatar Jan 26 '22 09:01 qinqon