kapp-controller icon indicating copy to clipboard operation
kapp-controller copied to clipboard

Use 'sieve' to check for distributed/concurrency related issues in kapp-controller

Open joe-kimmel-vmw opened this issue 2 years ago • 16 comments

Describe the problem/challenge you have It's very hard to prove the absence of bugs, and classically this class of bugs is very hard to find/detect/reproduce

Describe the solution you'd like luckily for us, some folks at vmware and UIUC are working on this "sieve" tool: https://github.com/sieve-project/sieve

Anything else you would like to add: Note that sieve is marketed as still in very early stages; imo this would be interesting to explore but if it's rocky we would have the options of reaching out directly for assistance and/or putting this thought onto the back-burner for a few months and coming back to see if the tool is more mature.

Note also that the sieve tool is described in the kube-con talk "Automated, Distributed Systems Testing for Kubernetes Controllers - Lalith Suresh, VMware & Xudong Sun, University of Illinois at Urbana-Champaign"


Vote on this request

This is an invitation to the community to vote on issues, to help us prioritize our backlog. Use the "smiley face" up to the right of this comment to vote.

👍 "I would like to see this addressed as soon as possible" 👎 "There are other more important things to focus on right now"

We are also happy to receive and review Pull Requests if you want to help working on this issue.

joe-kimmel-vmw avatar Oct 13 '21 23:10 joe-kimmel-vmw

This issue is being marked as stale due to a long period of inactivity and will be closed in 5 days if there is no response.

github-actions[bot] avatar Nov 23 '21 00:11 github-actions[bot]

Next step here would be to run sieve against kapp-controller and report issues surfaced in this issue. Maybe longer term this could be part of our CI process if we find the tool to be useful in surfacing issues. For now, we'll start by doing some initial research on the tool.

We may not get to this immediately, so we are open to contributions if anyone is interested.

danielhelfand avatar Dec 13 '21 17:12 danielhelfand

@danielhelfand @joe-kimmel-vmw just saw this. We'd be happy to look into testing kapp-controller with Sieve in the coming weeks. (cc @embano1 @marshtompsxd)

lalithsuresh avatar Mar 24 '22 21:03 lalithsuresh

Hi, @danielhelfand @joe-kimmel-vmw Thanks for your interest in Sieve. We are currently working on porting and testing the kapp-controller using Sieve. One necessary step in porting is to build the docker image from the source. However, I encountered some difficulties. I tried to use the Dockerfile in the repo to build the image by

docker build --no-cache -t xudongs/carvel-kapp-controller:latest .

But the building failed with the following error:

Step 19/38 : COPY . .                                                                                                                   
 ---> 86800275cf64                                                                                                                      
Step 20/38 : RUN CGO_ENABLED=0 GOOS=linux go build -mod=vendor -ldflags="-X 'main.Version=$KCTRL_VER' -buildid=" -trimpath -o controller ./cmd/main.go                                                                                                                          
 ---> Running in 969352fe57d4                                                                                                           
go: inconsistent vendoring in /go/src/github.com/vmware-tanzu/carvel-kapp-controller:                                                   
        cloud.google.com/[email protected]: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod                 
        github.com/NYTimes/[email protected]: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod       
        github.com/PuerkitoBio/[email protected]: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod        
        github.com/PuerkitoBio/[email protected]: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod                                                                                                                           
        ......
        To ignore the vendor directory, use -mod=readonly or -mod=mod.
        To sync the vendor directory, run:
                go mod vendor
The command '/bin/sh -c CGO_ENABLED=0 GOOS=linux go build -mod=vendor -ldflags="-X 'main.Version=$KCTRL_VER' -buildid=" -trimpath -o controller ./cmd/main.go' returned a non-zero code: 1

Could you let me know how to address this error? Or did I build the image wrongly? If I am able to successfully build the image, I will finish porting and start testing the controller. Thanks!

marshtompsxd avatar Apr 27 '22 03:04 marshtompsxd

@marshtompsxd - Thanks for following up!

It's very interesting that you hit that error! To be honest we don't usually invoke docker build directly - the build scripts that we run, and that are run in our github actions, all invoke kbld: https://github.com/vmware-tanzu/carvel-kapp-controller/blob/develop/hack/deploy.sh#L5

the relevant subcommand there is ytt -f config/ | kbld -f- which renders the ytt templates and passes them through kbld, which in turn calls Docker build as configured by the templates. Is it possible for you to use that flow to build the container?

joe-kimmel-vmw avatar Apr 27 '22 03:04 joe-kimmel-vmw

Hi @joe-kimmel-vmw Thanks for the reply. I tried the command and I think I managed to build the image. After the command succeeded I listed all the docker images:

REPOSITORY   TAG                                                                                       IMAGE ID       CREATED          SIZE
kbld         kapp-controller-sha256-fac429e6eea664d64c864c3d8ece30d4406359926fbfefd3aaf4ba1d882d31be   fac429e6eea6   11 seconds ago   578MB
<none>       <none>                                                                                    e1acdea340cb   30 seconds ago   1.47GB
<none>       <none>                                                                                    f5d1ac9784b3   2 minutes ago    445MB
photon       4.0                                                                                       b432359f4c98   4 days ago       36.9MB

I think the first one should be the controller image.

marshtompsxd avatar Apr 27 '22 04:04 marshtompsxd

BTW, is it possible to specify the tag of the controller image when using the above command to build? I assume the kbld is the controller image the long tag seems to be random.

marshtompsxd avatar Apr 27 '22 04:04 marshtompsxd

Thanks @marshtompsxd - very excited you got it to work! Please let us know how we can support your investigations.

to your question

is it possible to specify the tag of the controller image when using the above command to build?

as you saw, kbld really wants to lock down a specific SHA for reproducibility. However, you can provide multiple tags to docker via the command line. in our case you would add the tag to the rawOptions list documented here

We're still confused also by your original error, as we would expect docker build to just work. However if you're past it that's great - thanks for your efforts and again let us know how we can support you!

joe-kimmel-vmw avatar Apr 27 '22 16:04 joe-kimmel-vmw

Hi @joe-kimmel-vmw , thanks for your help and sorry for the late response. We have ported the kapp controller with Sieve, but unfortunately the current version of Sieve cannot test it because kapp controller does not issue updates to the k8s cluster (Create, Delete, Update, Patch) through the controller-runtime APIs -- Sieve assumes all the updates go through the controller-runtime APIs.

To address this, we are currently implementing a new feature to make Sieve more generalized and independent from controller-runtime. We will let you when the feature is implemented and the test results.

marshtompsxd avatar May 27 '22 02:05 marshtompsxd

hey @marshtompsxd just following up here (I know its been a fairly long period of time since last comment) - did you all manage to get that feature implemented? We would still like to run sieve against kapp-controller, so any further updates would be really helpful! Thanks

neil-hickey avatar Feb 22 '23 20:02 neil-hickey

Hi @neil-hickey Thanks for following up! Yes, we have already implemented the feature and now Sieve has no dependency on controller-runtime. We are working on porting kapp-controller right now.

marshtompsxd avatar Feb 22 '23 21:02 marshtompsxd

Hi @neil-hickey and others 👋,

I am working with the Sieve team to help test the tool with different controllers. We have finished porting kapp-controller to work with Sieve and was able to test a simple app with it (app.yml). Ideally, workflows that results in multiple objects to be made/manipulated behind the scenes are good candidates to run Sieve through (e.g., a CR that results in StatefulSets, pods, volumes etc. being created). Do you have any workflows that you can point us to that'd be a good fit to test with Sieve?

Thanks!

jerrinsg avatar Mar 20 '23 23:03 jerrinsg

That's great @jerrinsg ! Thanks :D

The examples directory has a whole array of different apps you could pick from: https://github.com/carvel-dev/kapp-controller/tree/develop/examples

Some of the "more" complicated ones include:

  • https://github.com/carvel-dev/kapp-controller/blob/develop/examples/redis-helm.yml
  • https://github.com/carvel-dev/kapp-controller/tree/develop/examples/cert-manager-tce-pkg

I don't have any other "big" examples, though one could be created pretty easily. Kapp-controller can deploy any package or helm chart that has a complicated workload. So if you could think of one that maybe you have used for any tool, we can port it and get it working here. Lemme know!

neil-hickey avatar Mar 21 '23 18:03 neil-hickey

Thanks @neil-hickey for the pointers! That should be a good start for us. I'll keep you posted how our testing goes.

jerrinsg avatar Mar 21 '23 22:03 jerrinsg

Posting a quick update here. I ran a test workload with kapp-controller (deploying guestbook-go). Sieve automatically explored 9 test plans involving crash safety but did not report any issues. On closer inspection I see that most of the interaction with the k8s API is driven by the kapp binary and not kapp-controller. Our sieve instrumentation tool was only instrumenting kapp-controller, so missed out on the interaction between kapp and k8s API. I'm looking at extending sieve instrumentation to kapp as well to see if it can detect any bugs.

jerrinsg avatar Apr 11 '23 00:04 jerrinsg

Nice! Yes, mostly kapp-controller looks at the CRDS and invoke a cycle of fetch, template, deploy where deploying is the most expensive and is done via kapp

neil-hickey avatar Apr 11 '23 20:04 neil-hickey