kapp-controller
kapp-controller copied to clipboard
Use 'sieve' to check for distributed/concurrency related issues in kapp-controller
Describe the problem/challenge you have It's very hard to prove the absence of bugs, and classically this class of bugs is very hard to find/detect/reproduce
Describe the solution you'd like luckily for us, some folks at vmware and UIUC are working on this "sieve" tool: https://github.com/sieve-project/sieve
Anything else you would like to add: Note that sieve is marketed as still in very early stages; imo this would be interesting to explore but if it's rocky we would have the options of reaching out directly for assistance and/or putting this thought onto the back-burner for a few months and coming back to see if the tool is more mature.
Note also that the sieve tool is described in the kube-con talk "Automated, Distributed Systems Testing for Kubernetes Controllers - Lalith Suresh, VMware & Xudong Sun, University of Illinois at Urbana-Champaign"
Vote on this request
This is an invitation to the community to vote on issues, to help us prioritize our backlog. Use the "smiley face" up to the right of this comment to vote.
👍 "I would like to see this addressed as soon as possible" 👎 "There are other more important things to focus on right now"
We are also happy to receive and review Pull Requests if you want to help working on this issue.
This issue is being marked as stale due to a long period of inactivity and will be closed in 5 days if there is no response.
Next step here would be to run sieve against kapp-controller and report issues surfaced in this issue. Maybe longer term this could be part of our CI process if we find the tool to be useful in surfacing issues. For now, we'll start by doing some initial research on the tool.
We may not get to this immediately, so we are open to contributions if anyone is interested.
@danielhelfand @joe-kimmel-vmw just saw this. We'd be happy to look into testing kapp-controller with Sieve in the coming weeks. (cc @embano1 @marshtompsxd)
Hi, @danielhelfand @joe-kimmel-vmw Thanks for your interest in Sieve. We are currently working on porting and testing the kapp-controller using Sieve. One necessary step in porting is to build the docker image from the source. However, I encountered some difficulties. I tried to use the Dockerfile in the repo to build the image by
docker build --no-cache -t xudongs/carvel-kapp-controller:latest .
But the building failed with the following error:
Step 19/38 : COPY . .
---> 86800275cf64
Step 20/38 : RUN CGO_ENABLED=0 GOOS=linux go build -mod=vendor -ldflags="-X 'main.Version=$KCTRL_VER' -buildid=" -trimpath -o controller ./cmd/main.go
---> Running in 969352fe57d4
go: inconsistent vendoring in /go/src/github.com/vmware-tanzu/carvel-kapp-controller:
cloud.google.com/[email protected]: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod
github.com/NYTimes/[email protected]: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod
github.com/PuerkitoBio/[email protected]: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod
github.com/PuerkitoBio/[email protected]: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod
......
To ignore the vendor directory, use -mod=readonly or -mod=mod.
To sync the vendor directory, run:
go mod vendor
The command '/bin/sh -c CGO_ENABLED=0 GOOS=linux go build -mod=vendor -ldflags="-X 'main.Version=$KCTRL_VER' -buildid=" -trimpath -o controller ./cmd/main.go' returned a non-zero code: 1
Could you let me know how to address this error? Or did I build the image wrongly? If I am able to successfully build the image, I will finish porting and start testing the controller. Thanks!
@marshtompsxd - Thanks for following up!
It's very interesting that you hit that error! To be honest we don't usually invoke docker build directly - the build scripts that we run, and that are run in our github actions, all invoke kbld: https://github.com/vmware-tanzu/carvel-kapp-controller/blob/develop/hack/deploy.sh#L5
the relevant subcommand there is ytt -f config/ | kbld -f-
which renders the ytt templates and passes them through kbld, which in turn calls Docker build as configured by the templates. Is it possible for you to use that flow to build the container?
Hi @joe-kimmel-vmw Thanks for the reply. I tried the command and I think I managed to build the image. After the command succeeded I listed all the docker images:
REPOSITORY TAG IMAGE ID CREATED SIZE
kbld kapp-controller-sha256-fac429e6eea664d64c864c3d8ece30d4406359926fbfefd3aaf4ba1d882d31be fac429e6eea6 11 seconds ago 578MB
<none> <none> e1acdea340cb 30 seconds ago 1.47GB
<none> <none> f5d1ac9784b3 2 minutes ago 445MB
photon 4.0 b432359f4c98 4 days ago 36.9MB
I think the first one should be the controller image.
BTW, is it possible to specify the tag of the controller image when using the above command to build? I assume the kbld
is the controller image the long tag seems to be random.
Thanks @marshtompsxd - very excited you got it to work! Please let us know how we can support your investigations.
to your question
is it possible to specify the tag of the controller image when using the above command to build?
as you saw, kbld really wants to lock down a specific SHA for reproducibility. However, you can provide multiple tags to docker via the command line. in our case you would add the tag to the rawOptions list documented here
We're still confused also by your original error, as we would expect docker build
to just work. However if you're past it that's great - thanks for your efforts and again let us know how we can support you!
Hi @joe-kimmel-vmw , thanks for your help and sorry for the late response. We have ported the kapp controller with Sieve, but unfortunately the current version of Sieve cannot test it because kapp controller does not issue updates to the k8s cluster (Create, Delete, Update, Patch) through the controller-runtime APIs -- Sieve assumes all the updates go through the controller-runtime APIs.
To address this, we are currently implementing a new feature to make Sieve more generalized and independent from controller-runtime. We will let you when the feature is implemented and the test results.
hey @marshtompsxd just following up here (I know its been a fairly long period of time since last comment) - did you all manage to get that feature implemented? We would still like to run sieve against kapp-controller, so any further updates would be really helpful! Thanks
Hi @neil-hickey Thanks for following up!
Yes, we have already implemented the feature and now Sieve has no dependency on controller-runtime
. We are working on porting kapp-controller right now.
Hi @neil-hickey and others 👋,
I am working with the Sieve team to help test the tool with different controllers. We have finished porting kapp-controller to work with Sieve and was able to test a simple app with it (app.yml). Ideally, workflows that results in multiple objects to be made/manipulated behind the scenes are good candidates to run Sieve through (e.g., a CR that results in StatefulSets, pods, volumes etc. being created). Do you have any workflows that you can point us to that'd be a good fit to test with Sieve?
Thanks!
That's great @jerrinsg ! Thanks :D
The examples directory has a whole array of different apps you could pick from: https://github.com/carvel-dev/kapp-controller/tree/develop/examples
Some of the "more" complicated ones include:
- https://github.com/carvel-dev/kapp-controller/blob/develop/examples/redis-helm.yml
- https://github.com/carvel-dev/kapp-controller/tree/develop/examples/cert-manager-tce-pkg
I don't have any other "big" examples, though one could be created pretty easily. Kapp-controller can deploy any package or helm chart that has a complicated workload. So if you could think of one that maybe you have used for any tool, we can port it and get it working here. Lemme know!
Thanks @neil-hickey for the pointers! That should be a good start for us. I'll keep you posted how our testing goes.
Posting a quick update here. I ran a test workload with kapp-controller (deploying guestbook-go). Sieve automatically explored 9 test plans involving crash safety but did not report any issues. On closer inspection I see that most of the interaction with the k8s API is driven by the kapp binary and not kapp-controller. Our sieve instrumentation tool was only instrumenting kapp-controller, so missed out on the interaction between kapp and k8s API. I'm looking at extending sieve instrumentation to kapp as well to see if it can detect any bugs.
Nice! Yes, mostly kapp-controller looks at the CRDS and invoke a cycle of fetch, template, deploy
where deploying is the most expensive and is done via kapp