woodpecker
woodpecker copied to clipboard
Native Kubernetes Support
try to re-base #23
#9
TODO
- [x] add docs
- [x] update helm charts
TEST it NOW
is there a way to specify the storageClass and maybe even size, via the backend config?
now this is very pedantic, but it might make sense to abbreviate corev1 instead of v1
I think proper volumes support is one of the biggest challenges we somehow have to solve in this PR. We should allow to specify volume details. Do you know if we can create some kind of kubernetes storage provided by most kubernetes installations which allows us to mount the same volume to multiple pods?
@anbraten depends on, do you wanna ReadOnly mount or ReadWrite? There is a ReadOnlyMany (ROX) and ReadWriteMany (RWX) modes that can be set on a PersistentVolumeClaim: see https://stackoverflow.com/a/62545427
I think most deployments only support RWO ( read write once ), I'm thinking AWS EBS volumes and Rook/Ceph, I don't think we should expect or require anything more complicated.
pipelines can be mostly serial, so it should be enough.
Perhaps with parallel stages jobs can be schlepped into a "mega" pod
@dmolik it may be simpler to start with RWX first as they are some options out there:
- AWS with Amazon EFS CSI driver see https://aws.amazon.com/de/premiumsupport/knowledge-center/eks-persistent-storage/
- Google with Filestore CSI driver see https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/filestore-csi-driver
- Azure with Azure Files see https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv
- For on Prem: Rook supports RWX via CephFS see https://github.com/ceph/ceph-csi and https://github.com/rook/rook/issues/543#issuecomment-388060949
- Or use NFS CSI driver see https://github.com/kubernetes-csi/csi-driver-nfs/blob/9811fe4c6fa00169b4f80832cd807c0203fa0059/deploy/example/pvc-nfs-csi-dynamic.yaml
How ever I am not sure how performant they are, but in my opinion it is simpler to start with them at the beginning then use some workarounds with RWO. The approach you described can still be explored after words if there are some issues with the RWX approach.
What do you think?
You can, in general, use a network drive. https://kubernetes.io/docs/concepts/storage/volumes/#nfs
But these are sometime problematic (usually if they fail on the umount), and should be used with care.
Personally I don't think that sticking to network drives are good idea. We're using kubernetes with local storage mostly, cause it is much more faster.
So I think we have to make that configurable ...
Personally I don't think that sticking to network drives are good idea. We're using kubernetes with local storage mostly, cause it is much more faster.
@kvaster This does not work for data that has to be shared across pods (because they can be located on different nodes) and even if you host them all on one node, this approach does not scale well and is not really failure tolerant, if your node dies all your workload and data does with it. But maybe I misunderstood you so please correct me if I am wrong.
Existing Kubernetes deployments are just that, dind agents that have parallel jobs all on the same node, and don't have network RWX volumes, either.
Generally there can be two types of volumes. First one - create PVC for all pods created during build and this PVC can be RWX (and network) and second one - PVC per pod.
@kvaster This does not work for data that has to be shared across pods (because they can be located on different nodes) and even if you host them all on one node, this approach does not scale well and is not really failure tolerant, if your node dies all your workload and data does with it. But maybe I misunderstood you so please correct me if I am wrong.
I think this is always question about cache, bandwidth, disk speed e.t.c. It is good to have flexibility. Sometimes it is much more better to use local disks while building with some kind of cache. Also this may be combined with network drive for sharing some part of the data while building artifacts in parallel.
can i inquire about the progress on this?
focus is on #784 - after this one got in we can move forward to native by reusing code and refactoring more ...
the mentioned pull is not merged jet as int touches code outside of pipeline/backend/*
- so this changes have to be proted and merged first seperatly
To be fair I am not 100% sure if we should take #784 over this one as I don't see the point in using the kubectl (not working for scratch images, rpm, deb packages etc out of the box). Suggestion would be to migrate all ideas / pieces from #784 into this one.
To be fair I am not 100% sure if we should take #784 over this one as I don't see the point in using the kubectl (not working for scratch images, rpm, deb packages etc out of the box). Suggestion would be to migrate all ideas / pieces from #784 into this one.
I agree with you, if we want to support k8s, use k8s-sdk is good. kubectl can use local
type to use it
To be fair I am not 100% sure if we should take #784 over this one as I don't see the point in using the kubectl (not working for scratch images, rpm, deb packages etc out of the box). Suggestion would be to migrate all ideas / pieces from #784 into this one.
That would be even better - it just need a lot more work to be done ...
If you have a look at the code its not too different. We already have most of the general functions of the kubectl backend in this PR as well.
If you have a look at the code its not to different. We already have most of the general functions of the kubectl backend in this PR as well.
i can't wait it be release
Hi, the reason I used kubectl is simplicity in implementation (apply, delete). That said, by changing the functions on "client.go" you can use the kube go api. All the interactions with kubernets are only there. Sadly, I do not have the time to do that.
We have been using the kubectl backend in development for about a month now with small issues (not detecting event based crashes is the worst one, it only detects BackOff for now). Once I have a fuller list I'll update these errors.
@xuecanlong I can safely say that we are now in the beta mode of that, and creating an image that uses that is very simple.
agree, useing kubectl is faster to ops cluster, but not convenience for complex operations
Codecov Report
Merging #552 (bf3a919) into master (dbbd369) will decrease coverage by
0.01%
. The diff coverage is0.00%
.
@@ Coverage Diff @@
## master #552 +/- ##
==========================================
- Coverage 49.61% 49.59% -0.02%
==========================================
Files 86 86
Lines 6553 6555 +2
==========================================
Hits 3251 3251
- Misses 3111 3113 +2
Partials 191 191
Impacted Files | Coverage Δ | |
---|---|---|
cmd/agent/agent.go | 0.00% <0.00%> (ø) |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Docs are missing
Deployment of preview was successful: https://woodpecker-ci-woodpecker-pr-552.surge.sh
This PR should be mainly done now. Some pipeline features are still missing, but I would add them from time to time. I will add the currently open points to https://github.com/woodpecker-ci/woodpecker/issues/9#issuecomment-483979755.
else for a starting point ... we can merge asap you call it "ready for review"
Not sure how to do this properly since this in and of itself is a PR, but I created a PR for this branch to implement a setting for using RWO access mode: https://github.com/anbraten/woodpecker/pull/3
https://github.com/anbraten/woodpecker/pull/4 need a merge
@anbraten hope you are fine with how it's done as ContextKey for now ...
I might refactor how backends do get added and so also add a new interface we can use for backend specific config, but that can be done later
Been testing this for several weeks, also just built and deployed locally after the last push to this PR, things are still working, so, FWIW: Tested-by: Stijn Tintel [email protected]