nats-server icon indicating copy to clipboard operation
nats-server copied to clipboard

Chaos tests for Consumers

Open mprimi opened this issue 1 year ago • 7 comments

  • [ ] Link to issue, e.g. Resolves #NNN
  • [x] Documentation added (if applicable)
  • [x] Tests added
  • [x] Branch rebased on top of current main (git pull --rebase origin main)
  • [ ] Changes squashed to a single commit
  • [ ] Build is green in Travis CI
  • [x] You have certified that the contribution is your original work and that you license the work to the project under the Apache 2 license

Changes proposed in this pull request:

Add 4 "chaos" tests for

  • Durable consumer
  • Ordered consumer
  • Pull consumer
  • Default (async) consumer

These tests are trivial to pass under normal circumstances. In this case, a chaos monkey keeps restarting the cluster.

This is surfacing some issues that would otherwise go unnoticed (discussed separately)

/cc @nats-io/core

mprimi avatar Aug 04 '22 20:08 mprimi

@mprimi I would recommend that you run some check before posting the PR to make sure that it will pass the basic compile/vet/etc.. You can obviously rely on Travis run to do that for you, but I often prefer to check locally first.

I used to have a script to check things, but you can actually run (from the server root):

./scripts/runTestsOnTravis.sh compile

For instance, on this branch, you would get:

$ ./scripts/runTestsOnTravis.sh compile
# github.com/nats-io/nats-server/v2/server
server/jetstream_chaos_helpers_test.go:208:17: fmt.Sprintf format %d arg b.count is a func value, not called

which is what Travis is reporting. So you can fix those little things even before posting the PR and save some time.

kozlovic avatar Aug 04 '22 21:08 kozlovic

would recommend that you run some check before posting the PR

@kozlovic apologies for the noise. I am still not quite familiarized with the kinks of the go toolchain. In this case, that line hadn't changed in days, and I've built and run tests a hundred times, and the compiler never complained. Not sure how that's possible. Maybe VS Code built the object with flags that don't consider that an error, and subsequent shell builds reused the object. 🤷 I'll run the Travis script from now on before pushing.

mprimi avatar Aug 04 '22 21:08 mprimi

@mprimi No worries, and I hope you did not take it the wrong way, that was not the intent. This is just some methodology that I personally use, but again, the point of Travis is that it will run those tests/verifications and so it is fine to rely on that, but I personally prefer to run it locally.

kozlovic avatar Aug 04 '22 21:08 kozlovic

I hope you did not take it the wrong way

To the contrary, thank you. I didn't realize the travis script works locally, it's quite convenient.

mprimi avatar Aug 04 '22 21:08 mprimi

@mprimi I just realized that the chaos tests are running (locally) when one runs all tests such as go test -race -v -p=1 ./.... Although that makes sense since we want to run all tests, I wonder if those should be excluded by default and trigger only when running with a positive tag, does that makes sense? Meaning that the build tag should be: //go:build run_js_chaos_tests, and we would run with:

go test -race -v -p=1 ./server -tags=run_js_chaos_tests

If that is the case you could remove the skip_js_chaos_tests from where it was added.

Let me know what you think.

kozlovic avatar Aug 05 '22 03:08 kozlovic

also noticed that in runTestOnTravis, you reference skip_js_cluster_chaos_tests, but the tag is actually skip_js_chaos_tests.

kozlovic avatar Aug 05 '22 03:08 kozlovic

also noticed that in runTestOnTravis, you reference skip_js_cluster_chaos_tests

Oopsie! That means they are included in the js_tests travis target, which was not intended. Here's a PR to fix. In case that's inconveniencing someone: https://github.com/nats-io/nats-server/pull/3340

I wonder if those should be excluded by default and trigger only when running with a positive tag

Yes, it makes sense to me to exclude by default. I'll add to 3340 unless you want to merge that one ASAP.

mprimi avatar Aug 05 '22 16:08 mprimi

@kozlovic just rebased, build passing, ready for review whenever you have a chance

mprimi avatar Aug 16 '22 17:08 mprimi