nats-server
nats-server copied to clipboard
Chaos tests for Consumers
- [ ] Link to issue, e.g.
Resolves #NNN
- [x] Documentation added (if applicable)
- [x] Tests added
- [x] Branch rebased on top of current main (
git pull --rebase origin main
) - [ ] Changes squashed to a single commit
- [ ] Build is green in Travis CI
- [x] You have certified that the contribution is your original work and that you license the work to the project under the Apache 2 license
Changes proposed in this pull request:
Add 4 "chaos" tests for
- Durable consumer
- Ordered consumer
- Pull consumer
- Default (async) consumer
These tests are trivial to pass under normal circumstances. In this case, a chaos monkey keeps restarting the cluster.
This is surfacing some issues that would otherwise go unnoticed (discussed separately)
/cc @nats-io/core
@mprimi I would recommend that you run some check before posting the PR to make sure that it will pass the basic compile/vet/etc.. You can obviously rely on Travis run to do that for you, but I often prefer to check locally first.
I used to have a script to check things, but you can actually run (from the server root):
./scripts/runTestsOnTravis.sh compile
For instance, on this branch, you would get:
$ ./scripts/runTestsOnTravis.sh compile
# github.com/nats-io/nats-server/v2/server
server/jetstream_chaos_helpers_test.go:208:17: fmt.Sprintf format %d arg b.count is a func value, not called
which is what Travis is reporting. So you can fix those little things even before posting the PR and save some time.
would recommend that you run some check before posting the PR
@kozlovic apologies for the noise. I am still not quite familiarized with the kinks of the go toolchain. In this case, that line hadn't changed in days, and I've built and run tests a hundred times, and the compiler never complained. Not sure how that's possible. Maybe VS Code built the object with flags that don't consider that an error, and subsequent shell builds reused the object. 🤷 I'll run the Travis script from now on before pushing.
@mprimi No worries, and I hope you did not take it the wrong way, that was not the intent. This is just some methodology that I personally use, but again, the point of Travis is that it will run those tests/verifications and so it is fine to rely on that, but I personally prefer to run it locally.
I hope you did not take it the wrong way
To the contrary, thank you. I didn't realize the travis script works locally, it's quite convenient.
@mprimi I just realized that the chaos tests are running (locally) when one runs all tests such as go test -race -v -p=1 ./...
. Although that makes sense since we want to run all tests, I wonder if those should be excluded by default and trigger only when running with a positive tag, does that makes sense? Meaning that the build tag should be: //go:build run_js_chaos_tests
, and we would run with:
go test -race -v -p=1 ./server -tags=run_js_chaos_tests
If that is the case you could remove the skip_js_chaos_tests from where it was added.
Let me know what you think.
also noticed that in runTestOnTravis, you reference skip_js_cluster_chaos_tests
, but the tag is actually skip_js_chaos_tests
.
also noticed that in runTestOnTravis, you reference
skip_js_cluster_chaos_tests
Oopsie! That means they are included in the js_tests
travis target, which was not intended.
Here's a PR to fix. In case that's inconveniencing someone: https://github.com/nats-io/nats-server/pull/3340
I wonder if those should be excluded by default and trigger only when running with a positive tag
Yes, it makes sense to me to exclude by default. I'll add to 3340 unless you want to merge that one ASAP.
@kozlovic just rebased, build passing, ready for review whenever you have a chance