luigi icon indicating copy to clipboard operation
luigi copied to clipboard

Replace deprecated pykube(-ng) library with official Kubernetes python library

Open AndrewFarley opened this issue 4 years ago • 14 comments

Description

The pykube and pykube-ng libraries that Luigi depends on are flawed and don't seem to support more recent versions of Kubernetes. Both libraries appear to be abandonware, so we should pivot to the official python library for Kubernetes. This pivot will simplify some of the handlings of this library, as certain configuration values are already supported and configurable via environment variables, as is the widely accepted standard for Kubectl and its supported tools and toolchain. This also adds the following...

  • Better/more debug logging about what its doing to/in Kubernetes
  • Event logging before the pod starts up (so you can see what its doing, eg Pulling Image, etc)
  • Cleaned up code a bit
  • Able to detect cluster scaling up (contribution from @dav009)
  • Able to detect cluster scaling up won't work and failing
  • WIP Adding more robust tests to ensure we're handling all success and failure conditions properly

Motivation and Context

Luigi wasn't working for me on a number of my clusters, and after further investigation this was largely due to the outdated and deprecated libraries in-use. I've found numerous other projects which have pivoted over to the official library with a lot of success and used those largely as examples for how to do this.

I've added extra comments and documentation in all places that make sense for others to be able to more-easily understand how/what to do to use and adopt Kubernetes support for Luigi.

Have you tested this? If so, how?

I've updated the unit tests and run a few test pipelines successfully with all features.

Items to complete

  • [x] Implement core feature-set and pivot to the new library
  • [x] Add documentation
  • [ ] WIP Add/update the test coverage
  • [ ] WIP Add ability to watch logs during a run, not just after a run
  • [ ] Merge and publish a new version for all to enjoy

AndrewFarley avatar Aug 03 '21 07:08 AndrewFarley

Note: I am merging and fixing up the MR #3089 in this MR. Picture of proof of work, added tons of debug logging as well. MR will be updated shortly I believe addressing all issues...

Screen Shot 2021-08-30 at 3 07 28 AM

AndrewFarley avatar Aug 29 '21 15:08 AndrewFarley

@dav009 I merged(ish) your branch into mine logicall, but added code to detect if a scale up failed (aka, it won't fit even with scale up) which the previous code would infinite loop because it wasn't looking into the Kubernetes event stream. Pic of proof...

Screen Shot 2021-08-30 at 3 10 12 AM

AndrewFarley avatar Aug 29 '21 15:08 AndrewFarley

@dlstadther Please have another look and review code, but please don't merge yet, I've decided I need to add some further tests because it's really abysmal to not really have any decent ones here to ensure things work as they should, there's so many different kinds of failure conditions which are trying to be detected here and I think we should test them. I'm not fantastic with Pytest so it might be a bit slow, but I'm working on adding tests which I think are critical before merging this. Only able to work on this in free time, so if anyone else can/wants to help me with tests, please do. I've left some of my hacking at tests commented out.

AndrewFarley avatar Aug 30 '21 05:08 AndrewFarley

And @dav009 please review the code here and make sure I encapsulated what you were trying to do with your MR. Overall, this MR will have a lot better QoL support for Kubernetes. There's still tons more things I think could/should be done in this library but mostly pedantic or improvements and could be in future merges instead. I'm still working on redoing some of the tests and will sorta include your tests once I fix them. Things I'd love to see in the future are for example on API failure or timeout or general internet jitter/brief unavailability some automated retry and/or exponential backoff mechanism.

AndrewFarley avatar Aug 30 '21 06:08 AndrewFarley

@AndrewFarley amazing job, thank you. I was also quite frustrated with how outdated this is. I will try to review this soonish.

😅 I made an internal library for my organisation that addresses these (old pykube) and also take into account other issues (logging). I am testing it at the moment. We were planning on releasing it at some point.

dav009 avatar Aug 31 '21 07:08 dav009

@AndrewFarley In my team we ended up implementing this kubernetes contrib update as an independent package: https://github.com/optimizely/kubeluigi

dav009 avatar Nov 12 '21 05:11 dav009

Ping. Folks, I'm not using Luigi, but this MR still really needs to happen. If anyone has time to rebase and take over this PR, please let me know I'm happy to transfer my fork to you or add you permissions. I know a few folks are using my fork already because it's necessary if you want Kubernetes support, so perhaps one of you can take the reins?

AndrewFarley avatar Feb 14 '22 06:02 AndrewFarley

Hi, why was this not reviewed yet? I tested it locally and it seems to work great..

nlahmi-cymotive avatar Jan 24 '23 14:01 nlahmi-cymotive

Hah, I wrote this 2 years ago and still no one merged it. Is this project dead? I'm really glad this client of mine I recommended to not choose Luigi. Eesh.

AndrewFarley avatar Jan 26 '23 09:01 AndrewFarley

I just rebased and re-pushed and validated this works on Kubernetes still. Hopefully, in the hopes and dreams of fellow engineers, someone can see it in their heart to merge this. Or, just don't use Kubernetes. :P Up to you

AndrewFarley avatar Jan 26 '23 09:01 AndrewFarley

Apologies for this not being reviewed.

As far "is this project dead?" - there is not consistent maintainer involvement (myself included). The project is still owned by Spotify, but not actively developed (to my knowledge, mostly due to their articles related to their intent to replace Luigi with Flyte (note, this is just my personal assumption)). While I have permissions to merge PRs and push to PyPi, I have not used Luigi for development since changing employers a few years ago. I wish to see this project continue, but without an active community and additional maintainer participation, there's only so much I can do. I sympathize with your disappointment and I'm here to help as I reasonably can in my spare time.

dlstadther avatar Jan 28 '23 18:01 dlstadther

@dlstadther I've just fixed the code styling to satisfy that need. I indeed also don't use Luigi and more, and I don't really have tons of time to write/update/etc the tests to validate that this works. I can confirm it did work for me once upon a time, and others above have recently checked its (basic) functionality. I can/should leave the TODOs in the test code commented out, in hopes that someone else can take up the reigns and add the tests.

At least the flake tests pass now - https://github.com/spotify/luigi/actions/runs/4034116089/jobs/6935056885

But it appears there are some other errors now? I don't have much time this weekend, but I will try to review this further sometime soon.

AndrewFarley avatar Jan 28 '23 23:01 AndrewFarley

@AndrewFarley , thank you for the quick turnaround to address the failing flake8 tests. I'm good with getting this included once test results allow me to merge. I hope whomever will come use this updated k8s support within luigi will contribute back any improvements and test coverage items.

I just checked the current failures and they're all 1 failure:

  • the test class tries to load a kubeconfig and cannot find one - kubernetes_api.config.load_kube_config()

So looks to me like once a kube config is mocked (or however it is expected) for tests, that we'd be good to go for now.

dlstadther avatar Jan 29 '23 01:01 dlstadther

Well, in order for the Kubernetes stuff to work, it needs to initialize the library and config somehow. I will need to have a think and a google around of some creative way to workaround this for testing only which won't have K8s support.

AndrewFarley avatar Jan 29 '23 20:01 AndrewFarley