piraeus-operator icon indicating copy to clipboard operation
piraeus-operator copied to clipboard

Scheduling extender plugin for LINSTOR

Open kvaps opened this issue 3 years ago • 17 comments

Follows up https://github.com/piraeusdatastore/piraeus-operator/pull/219#issuecomment-1027282815

Yeah, now I understand why you decided to get rid of stork. It's quite huge, slow, contains a lot of complex controllers and requires ultimate permissions on cluster and does not support secure communication with kube-scheduller. As a result, it is slowly updated.

Nevertheless, I found that stork scheduler extender component does not require such maintenance.

Eg. it is working fine with Kubernetes ±1.22. Only config format for kube-scheduler was changed.

I'm still convinced in my point. We shouldn't use Kubernetes CSI topology feature until it can be dynamically updated after volume is created. So I made a decision to write separate linstor scheduler extender based on existing stork driver, and run it as sidecar to the kube-scheduler pod.

By this I solved following problems:

  • Now we have independed linstor-scheduller with small amount of code and superfluous controllers/CRDs/whatever
  • Permissions are reduced to same amount as Kube-sheduler have.
  • Secure communication over localhost instead of sending raw HTTP requests over Kubernetes networking.
  • The code is backward-compatible with current stork-driver.

Link to the project: https://github.com/kvaps/linstor-scheduler-extender Take a look at deploy folder, it has self-sufficient example.

kvaps avatar Feb 08 '22 00:02 kvaps

Nice work. I'll try to take a closer look at the project soon. One of the things I'd like to make sure is that this project does:

  • Work with topology, if enabled. I don't want to lock people into using only topology or only the scheduler plugin.
  • Work with late binding. This is something that broke STORK when I last tried it. I.e. the scheduler gets called for a Pod which has an unbound volume claim, because the claim is waiting for the volume to be scheduled first.

I haven't looked at what is reused from the STORK plugin, but if it's not too much work I would like to not depend on Stork. That way, it is much easier to update.

WanzenBug avatar Feb 14 '22 08:02 WanzenBug

Work with late binding. This is something that broke STORK when I last tried it. I.e. the scheduler gets called for a Pod which has an unbound volume claim, because the claim is waiting for the volume to be scheduled first.

Works as a charm. Linstor-scheduler just provides similar ranks for all the nodes for the first binding, letting Kubernetes to decide.

Work with topology, if enabled. I don't want to lock people into using only topology or only the scheduler plugin.

Not tested yet, but I think should works as well, the only change is that PV has hardcoded nodeAffinity.

I haven't looked at what is reused from the STORK plugin, but if it's not too much work I would like to not depend on Stork. That way, it is much easier to update.

Of course I wanted to start the new project using k8s-scheduler-extender-example or to simple disassemble stork libs to get rid of all it's parts. But I have no much time for doing this. The sheduler-extender interface is stable and do not change for ages, I decided to not touch what is already works fine.

The main goal was to get rid of all the moving parts (CRDs and all these fancy STORK controllers - we don't need them), thus I reused the single stupid thing: github.com/libopenstorage/stork/pkg/extender.

In fact the current implementation allows to drop-in the stork driver, but can be rewritten at any time. Since API will not change at all, it can be done at any time in the future versions of linstor-scheduler-extender

kvaps avatar Feb 14 '22 11:02 kvaps

only change is that PV has hardcoded nodeAffinity.

BTW, not really related to the issue at hand, but have you looked into the "advanced" access policy patterns: https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-kubernetes-params-allow-remote-volume-access

Using them makes the nodeAffinity in the PV less (or more) strict, if desired. It's still not as dynamic as a custom scheduler config, but at least it can decouple PV from bare hostnames

WanzenBug avatar Feb 14 '22 12:02 WanzenBug

So, we had a bit of an internal discussion, and the consensus is: yes, we want to support such a scheduler plugin.

@kvaps in https://github.com/piraeusdatastore/piraeus-operator/pull/269#issuecomment-1038971113 you indicated you would be willing to contribute your existing work to piraeus. Did I understand that right?

If so, I need to figure out the proper steps for that. Being part of the CNCF, it might not be as straight forward as forking the repo.

WanzenBug avatar Feb 16 '22 13:02 WanzenBug

Sure. Let's do that.

I'm already active CNCF contributor and signed CLA as well. Guide me for the further steps, please.

kvaps avatar Feb 16 '22 15:02 kvaps

Moving forward with this, I'd say we fork your existing repo (we can request "detaching" the fork later, if needed).

After we set up all the images, we add a new chart which deploys the image to https://github.com/piraeusdatastore/helm-charts

Then we can think about adding it as a (optional) dependency to the operator.

@kvaps https://github.com/deckhouse/linstor-scheduler-extender thats the right repo to fork, right?

WanzenBug avatar Feb 25 '22 09:02 WanzenBug

https://github.com/deckhouse/linstor-scheduler-extender thats the right repo to fork, right?

Both repos are identical.

Hey, I can try to change owner of my repo. Which github organization should I choose, piraeusdatastore or linbit?

kvaps avatar Feb 25 '22 10:02 kvaps

Please change it to piraeusdatastore

WanzenBug avatar Feb 25 '22 10:02 WanzenBug

No way:

 You don’t have the permission to create public repositories on piraeusdatastore 

It seems I need to be a member of organization first :)

kvaps avatar Feb 25 '22 10:02 kvaps

Just realized I'm not an owner in this org, so I can't add you, either :upside_down_face:

If you give me permissions to your own repo, I can initiate the transfer, since I can create public repos

WanzenBug avatar Feb 25 '22 10:02 WanzenBug

Okay, invite has been sent

kvaps avatar Feb 25 '22 10:02 kvaps

Still had to fork it. maybe only the owner can trigger the transfer? :shrug:

https://github.com/piraeusdatastore/linstor-scheduler-extender

I'll see about detaching the fork and adding you as a collaborator

WanzenBug avatar Feb 25 '22 11:02 WanzenBug

Yeah, you have to write github support for that. And make a first release please

kvaps avatar Feb 25 '22 11:02 kvaps

It might be important to change module name first: s/kvaps/piraeusdatastore/g

https://github.com/piraeusdatastore/linstor-scheduler-extender/blob/0136a32bb596483bc8ce6843c0bb9cb4893d9f91/go.mod#L1

kvaps avatar Feb 25 '22 11:02 kvaps

I've invited you to the repo. Changes are here, please review: https://github.com/piraeusdatastore/linstor-scheduler-extender/pull/1

WanzenBug avatar Feb 25 '22 11:02 WanzenBug

Detach already went through :)

WanzenBug avatar Feb 25 '22 11:02 WanzenBug

Just realized I'm not an owner in this org, so I can't add you, either :upside_down_face:

That's sad. I would love to have one more organization bage on my GitHub profile and feel like part of the project.

Maybe that fact would even improve the quality of my contributions 😉

kvaps avatar Mar 06 '22 13:03 kvaps