etcd Fix tx buffer inconsistency if there are unordered key writes in one tx.

Recommit https://github.com/etcd-io/etcd/pull/17228 with buf fix.

https://github.com/etcd-io/etcd/issues/17247 was fixed by reverting https://github.com/etcd-io/etcd/pull/17228. But the underlying problem with the buffer should still be fixed.

No failure locally with

EXPECT_DEBUG=true GO_TEST_FLAGS='-run=TestRobustness/Kubernetes/HighTraffic/ClusterOfSize3 --timeout=3000m --count=100 --failfast -v' RESULTS_DIR=./tmp/results make test-robustness

and just the [RaftAfterSaveSnapPanic, MemberReplace] Failpoints.

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

Jan 16 '24 22:01 siyuanfoundation

/retest

Jan 16 '24 23:01 siyuanfoundation

Hi @siyuanfoundation you can try to limit cpu by the following command and use patch #17248

EXPECT_DEBUG=true GO_TEST_FLAGS=${GO_TEST_FLAGS} RESULTS_DIR=/tmp/results taskset -c 0,1,2 make test-robustness

Hope it can help.

Jan 17 '24 02:01 fuweid

Hi @siyuanfoundation you can try to limit cpu by the following command and use patch #17248
EXPECT_DEBUG=true GO_TEST_FLAGS=${GO_TEST_FLAGS} RESULTS_DIR=/tmp/results taskset -c 0,1,2 make test-robustness
Hope it can help.

Thank you @fuweid, it did help!

Jan 17 '24 04:01 siyuanfoundation

cc @serathius @ahrtr @fuweid

Jan 17 '24 05:01 siyuanfoundation

Please read https://github.com/etcd-io/etcd/issues/17247#issuecomment-1895725142

Jan 18 '24 17:01 ahrtr

cc @ahrtr @serathius

Jan 25 '24 23:01 siyuanfoundation

/retest

Jan 26 '24 02:01 siyuanfoundation

The change looks good to me, it fixes a clear problem and adds regression tests, however https://github.com/etcd-io/etcd/pull/17228 did the same. Can we take some additional steps to ensure we prevent another regression?

I don't we can depend on reviewers trying to analyse and find edge cases as it failed before. One idea I have is to do comparison testing between buffered and unbuffered transaction. What do you think?

Jan 29 '24 16:01 serathius

cc @ahrtr Added benchmark test for writeback function Ran the test 3 times. The results are kind of noisy, but there is no clear indication of performance regression. Before the PR:

BenchmarkWritebackSeqBatches1BatchSize10000-8      	     100	 132904368, 137525118, 134134904  ns/op
BenchmarkWritebackSeqBatches10BatchSize1000-8      	     100	 135975057, 144034473, 138393072 ns/op
BenchmarkWritebackSeqBatches100BatchSize100-8      	     100	 145159646, 147396648, 148373301 ns/op
BenchmarkWritebackSeqBatches1000BatchSize10-8      	     100	 225271045, 225629845, 229453516 ns/op
BenchmarkWritebackNonSeqBatches1000BatchSize1-8    	     100	  29684442, 29051643, 31323272 ns/op
BenchmarkWritebackNonSeqBatches10000BatchSize1-8   	     100	1011180750, 1012368726, 1013996570 ns/op
BenchmarkWritebackNonSeqBatches100BatchSize10-8    	     100	  22771697, 22785210, 22529997 ns/op
BenchmarkWritebackNonSeqBatches1000BatchSize10-8   	     100	 233131758, 233625789, 238527133 ns/op

After the PR:

enchmarkWritebackSeqBatches1BatchSize10000-8      	     100	 136669713, 131847663, 130722632 ns/op
BenchmarkWritebackSeqBatches10BatchSize1000-8      	     100	 139880124, 139196486, 137805060 ns/op
BenchmarkWritebackSeqBatches100BatchSize100-8      	     100	 150674477, 146649422, 146416992 ns/op
BenchmarkWritebackSeqBatches1000BatchSize10-8      	     100	 226596374, 228097908, 223997282 ns/op
BenchmarkWritebackNonSeqBatches1000BatchSize1-8    	     100	  30098653, 30031651, 29341949 ns/op
BenchmarkWritebackNonSeqBatches10000BatchSize1-8   	     100	 990611851, 999573564, 994189677 ns/op
BenchmarkWritebackNonSeqBatches100BatchSize10-8    	     100	  22655668, 22641929, 21832902 ns/op
BenchmarkWritebackNonSeqBatches1000BatchSize10-8   	     100	 234623372, 233671979, 231825643 ns/op

Mar 25 '24 23:03 siyuanfoundation

The change looks good to me, it fixes a clear problem and adds regression tests, however #17228 did the same. Can we take some additional steps to ensure we prevent another regression?

I don't we can depend on reviewers trying to analyse and find edge cases as it failed before. One idea I have is to do comparison testing between buffered and unbuffered transaction. What do you think?

cc @serathius This bug happened because of the coincidence of Commit() timing between two different buckets, like in the tests I had to do two consecutive tx.Commit() to reproduce the bug. It is very hard to ensure there is no regression deterministically. In this case, we just have to be careful with more unit tests and rely on non-deterministic robustness test to detect the regression.

Mar 26 '24 03:03 siyuanfoundation

Ran the test 3 times. The results are kind of noisy, but there is no clear indication of performance regression. Before the PR:

Can you use benchstat ? https://sourcegraph.com/blog/go/gophercon-2019-optimizing-go-code-without-a-blindfold

Mar 26 '24 08:03 serathius

Ran the test 3 times. The results are kind of noisy, but there is no clear indication of performance regression. Before the PR:

Can you use benchstat ? https://sourcegraph.com/blog/go/gophercon-2019-optimizing-go-code-without-a-blindfold

Ok. I ran

go clean -testcache && go test -bench=. -count=10 -timeout 0 > benchmark_results.txt
benchstat benchmark_results.txt

Here are the results: (they are basically the same) Before

WritebackSeqBatches1BatchSize10000-8               128.2m ± 1%
WritebackSeqBatches10BatchSize1000-8               130.7m ± 1%
WritebackSeqBatches100BatchSize100-8               139.5m ± 1%
WritebackSeqBatches1000BatchSize10-8               220.6m ± 4%
WritebackNonSeqBatches1000BatchSize1-8             28.34m ± 2%
WritebackNonSeqBatches10000BatchSize1-8             1.002 ± 1%
WritebackNonSeqBatches100BatchSize10-8             21.64m ± 2%
WritebackNonSeqBatches1000BatchSize10-8            226.3m ± 1%

After

WritebackSeqBatches1BatchSize10000-8              128.3m ± 1%
WritebackSeqBatches10BatchSize1000-8              130.8m ± 3%
WritebackSeqBatches100BatchSize100-8              140.0m ± 1%
WritebackSeqBatches1000BatchSize10-8              216.3m ± 4%
WritebackNonSeqBatches1000BatchSize1-8            27.49m ± 2%
WritebackNonSeqBatches10000BatchSize1-8           966.6m ± 1%
WritebackNonSeqBatches100BatchSize10-8            20.47m ± 2%
WritebackNonSeqBatches1000BatchSize10-8           221.3m ± 1%

Mar 26 '24 22:03 siyuanfoundation

Here are the results: (they are basically the same)

Benchstat do not change the results, it just makes them readable and makes it easy to check if they are statistically significant

If you provide both old and new benchmark results to benchstat, it will provide also the performance delta and p-value

Mar 27 '24 08:03 serathius

Here are the results: (they are basically the same)

Benchstat do not change the results, it just makes them readable and makes it easy to check if they are statistically significant

If you provide both old and new benchmark results to benchstat, it will provide also the performance delta and p-value

Reran benchstat twice with count=10, run 1:

                                        │ benchmark_before.txt │        benchmark_after.txt         │
                                        │        sec/op        │   sec/op     vs base               │
WritebackSeqBatches1BatchSize10000-8               127.7m ± 1%   127.2m ± 3%       ~ (p=0.796 n=10)
WritebackSeqBatches10BatchSize1000-8               129.9m ± 6%   129.9m ± 1%       ~ (p=0.971 n=10)
WritebackSeqBatches100BatchSize100-8               138.6m ± 1%   137.0m ± 1%  -1.17% (p=0.029 n=10)
WritebackSeqBatches1000BatchSize10-8               215.7m ± 1%   212.7m ± 4%       ~ (p=0.052 n=10)
WritebackNonSeqBatches1000BatchSize1-8             28.48m ± 2%   28.05m ± 2%  -1.48% (p=0.035 n=10)
WritebackNonSeqBatches10000BatchSize1-8            981.0m ± 1%   949.6m ± 0%  -3.20% (p=0.000 n=10)
WritebackNonSeqBatches100BatchSize10-8             20.34m ± 1%   21.21m ± 5%  +4.27% (p=0.000 n=10)
WritebackNonSeqBatches1000BatchSize10-8            216.3m ± 1%   216.4m ± 4%       ~ (p=0.796 n=10)

run 2:

                                        │ benchmark_before_0.txt │       benchmark_after_0.txt        │
                                        │         sec/op         │   sec/op     vs base               │
WritebackSeqBatches1BatchSize10000-8                 127.8m ± 3%   130.0m ± 2%  +1.73% (p=0.023 n=10)
WritebackSeqBatches10BatchSize1000-8                 127.2m ± 1%   130.9m ± 3%  +2.95% (p=0.001 n=10)
WritebackSeqBatches100BatchSize100-8                 135.4m ± 1%   139.2m ± 3%  +2.76% (p=0.001 n=10)
WritebackSeqBatches1000BatchSize10-8                 212.4m ± 2%   213.1m ± 0%       ~ (p=0.165 n=10)
WritebackNonSeqBatches1000BatchSize1-8               27.82m ± 2%   28.80m ± 3%  +3.51% (p=0.001 n=10)
WritebackNonSeqBatches10000BatchSize1-8              970.1m ± 2%   951.8m ± 1%  -1.89% (p=0.000 n=10)
WritebackNonSeqBatches100BatchSize10-8               20.23m ± 1%   20.39m ± 5%       ~ (p=0.165 n=10)
WritebackNonSeqBatches1000BatchSize10-8              217.1m ± 1%   216.6m ± 1%       ~ (p=0.529 n=10)

Mar 28 '24 02:03 siyuanfoundation

So in the updated benchmark results you can see that it causes ~2% regression in performance. Still I think impact on overall etcd performance should be not noticable and we should pick correctness here.

Mar 28 '24 11:03 serathius

etcd etcd copied to clipboard

Fix tx buffer inconsistency if there are unordered key writes in one tx.

etcd
etcd copied to clipboard