etcd icon indicating copy to clipboard operation
etcd copied to clipboard

[Robustness tests] Reduce runtime

Open serathius opened this issue 2 years ago • 15 comments

What would you like to be added?

There was a recent runtime increase for linearizability tests. image

Execution time almost doubled, for example for presubmit it went from ~18m to 30m and for nightly it went from 2h to 3h.

My main expectation is because of growing complexity of etcd model. Based on my experience running the test, recent improvements have reduced variance of run time, however it increased overall time.

For now I totally ignored performance for simplicity (like use json to serialize model), however we should start looking into low hanging fruit and fix it.

My recommendation would be to start profiling to identify places for improvement.

Why is this needed?

Increased run time started causing failures in run. 30 minutes for presubmit is also too long.

serathius avatar Jan 12 '23 09:01 serathius

cc @tjungblu You were interested in etcd model performance.

serathius avatar Jan 12 '23 09:01 serathius

I would like to help. @serathius would you mind to assign it to me? Thanks

fuweid avatar Jan 12 '23 10:01 fuweid

Sure, thanks for offering. I did some simple investigation by collecting pprof from github action in https://github.com/serathius/etcd/actions/runs/3901079392 (available in artifacts file)

Maybe it would be useful

serathius avatar Jan 12 '23 10:01 serathius

image

It showed that json took ~27% of cpu time. However one discrepancy is that it recorded only 40s from 15m github action. Either I didn't collect profile correctly or there are other things that should be optimized.

serathius avatar Jan 12 '23 11:01 serathius

@fuweid feel free to pick this up, I've spent some time last year to move porcupine to generics in: https://github.com/tjungblu/porcupine

That way you can skip the json serde and without the casting it's a bit quicker in general. If the checking itself is taking longer due to an exponential search space increase, it's not going to help much however ;)

tjungblu avatar Jan 12 '23 11:01 tjungblu

If the checking itself is taking longer due to an exponential search space increase, it's not going to help much however ;)

Overall you are right, however with https://github.com/etcd-io/etcd/pull/15078 chances that space grows exponentially were greatly reduced.

serathius avatar Jan 12 '23 12:01 serathius

Interesting I found one seemingly unrelated culprit. With network proxy enabled 3 member clusters takes 8 seconds to shutdown.

serathius avatar Jan 12 '23 13:01 serathius

This should already help https://github.com/etcd-io/etcd/pull/15091

serathius avatar Jan 12 '23 14:01 serathius

Still investigating the cause.

And the github action page shows total duration including pending time, which is not correct. Wrote a tool to dump the job run duration. Parse the result here.

Click to check!
go run main.go --owner etcd-io --repo etcd run list --workflow-id 36262010 --created 2022-12-01..2023-01-30 --jobs "test / test" --jobs "test" --event push --status success
ID          NAME             EVENT  HEAD(MESSAGE)                                                                     STATUS   JOB(test / test)  JOB(test)  CREATED
4044118009  Linearizability  push   Merge pull request #15191 from serathius/linearizability-failpoints-timeout       success  13m42s            N/A        Mon Jan 30 13:05:34 UTC 2023
4041995481  Linearizability  push   tools: fix the CSV format error and add build target 'tools' in Makefile          success  14m26s            N/A        Mon Jan 30 08:35:48 UTC 2023
4041504776  Linearizability  push   fix(client):getToken automatically when the token has expired                     success  13m32s            N/A        Mon Jan 30 07:20:05 UTC 2023
4040377539  Linearizability  push   CHANGELOG: add item about improvement for concurrent read                         success  14m58s            N/A        Mon Jan 30 03:43:59 UTC 2023
4033814019  Linearizability  push   test: reorder the import items in linearizability/model/model.go                  success  13m12s            N/A        Sat Jan 28 22:12:07 UTC 2023
4020868209  Linearizability  push   etcdserver: correct the old name of notifyc in comments                           success  14m9s             N/A        Fri Jan 27 02:02:28 UTC 2023
4000937645  Linearizability  push   Merge pull request #15180 from serathius/linearizability-large-put                success  14m14s            N/A        Tue Jan 24 22:23:03 UTC 2023
4000537796  Linearizability  push   Merge pull request #15178 from serathius/linearizability-defrag                   success  13m30s            N/A        Tue Jan 24 21:23:58 UTC 2023
3999848587  Linearizability  push   Change verb and use switch stmt                                                   success  12m52s            N/A        Tue Jan 24 19:47:23 UTC 2023
3999604131  Linearizability  push   Merge pull request #15179 from serathius/linearizability-deduplicate-scenarios    success  13m9s             N/A        Tue Jan 24 19:11:59 UTC 2023
3999310141  Linearizability  push   CHANGELOG: add item about improvement for limit range                             success  13m12s            N/A        Tue Jan 24 18:33:12 UTC 2023
3999128047  Linearizability  push   Merge pull request #15172 from dims/add-functional-tests-on-arm64                 success  13m40s            N/A        Tue Jan 24 18:09:43 UTC 2023
3997926134  Linearizability  push   Merge pull request #15176 from serathius/linearizability-cluster-stop             success  12m59s            N/A        Tue Jan 24 15:48:10 UTC 2023
3996330157  Linearizability  push   Merge pull request #15159 from serathius/linearizability-separate-request-type    success  13m5s             N/A        Tue Jan 24 12:48:02 UTC 2023
3994529219  Linearizability  push   dependency: bump github.com/google/addlicense from 1.1.0 to 1.1.1                 success  12m40s            N/A        Tue Jan 24 08:59:51 UTC 2023
3991720260  Linearizability  push   build(deps): bump golangci/golangci-lint-action from 3.3.1 to 3.4.0               success  13m15s            N/A        Tue Jan 24 00:10:47 UTC 2023
3988840683  Linearizability  push   build(deps): bump golangci/golangci-lint-action from 3.3.1 to 3.4.0               success  14m16s            N/A        Mon Jan 23 17:16:20 UTC 2023
3988771639  Linearizability  push   build(deps): bump github.com/coreos/go-semver from 0.3.0 to 0.3.1                 success  13m16s            N/A        Mon Jan 23 17:08:24 UTC 2023
3988744924  Linearizability  push   build(deps): bump github.com/google/addlicense in /tools/mod                      success  13m37s            N/A        Mon Jan 23 17:05:20 UTC 2023
3988564469  Linearizability  push   Merge pull request #15158 from sashamelentyev/cleanup-2                           success  13m26s            N/A        Mon Jan 23 16:44:24 UTC 2023
3985429387  Linearizability  push   changelog: Set release date for v3.5.7                                            success  13m35s            N/A        Mon Jan 23 10:22:01 UTC 2023
3966850271  Linearizability  push   build(deps): bump github/codeql-action from 2.1.38 to 2.1.39                      success  13m12s            N/A        Fri Jan 20 10:36:00 UTC 2023
3966357790  Linearizability  push   Merge pull request #15149 from serathius/linearizability-artifact-name            success  13m46s            N/A        Fri Jan 20 09:27:22 UTC 2023
3966076803  Linearizability  push   build(deps): bump github/codeql-action from 2.1.38 to 2.1.39                      success  13m15s            N/A        Fri Jan 20 08:47:42 UTC 2023
3966073521  Linearizability  push   Merge pull request #15142 from serathius/linearizability-nightly-fix2             success  13m31s            N/A        Fri Jan 20 08:47:11 UTC 2023
3956826236  Linearizability  push   Merge pull request #15146 from Niravprajapati1/typo-fix                           success  10m36s            N/A        Thu Jan 19 08:55:49 UTC 2023
3952945944  Linearizability  push   Merge pull request #15130 from guozhao-coder/replace-timeAfter                    success  10m26s            N/A        Wed Jan 18 21:09:25 UTC 2023
3948927616  Linearizability  push   Merge pull request #14682 from serathius/issue13766                               success  13m50s            N/A        Wed Jan 18 12:37:52 UTC 2023
3945147091  Linearizability  push   Merge pull request #15135 from fuweid/update-3.4-changelog                        success  25m10s            N/A        Wed Jan 18 02:30:17 UTC 2023
3937942188  Linearizability  push   Merge pull request #15132 from serathius/linearizability-nightly-fix              success  25m24s            N/A        Tue Jan 17 09:40:59 UTC 2023
3937801834  Linearizability  push   Merge pull request #15098 from ahrtr/changelog_promote_learner_20230113           success  26m3s             N/A        Tue Jan 17 09:23:30 UTC 2023
3937305327  Linearizability  push   Merge pull request #15128 from serathius/linearizability-move-model               success  25m38s            N/A        Tue Jan 17 08:17:35 UTC 2023
3934624945  Linearizability  push   Merge pull request #15129 from ahrtr/bump_dependency_20230117                     success  25m46s            N/A        Mon Jan 16 23:16:51 UTC 2023
3934600548  Linearizability  push   Merge pull request #15095 from ahrtr/promote_non_exist_id_20230113                success  25m27s            N/A        Mon Jan 16 23:12:33 UTC 2023
3934506453  Linearizability  push   build(deps): bump github/codeql-action from 2.1.37 to 2.1.38                      success  25m21s            N/A        Mon Jan 16 22:55:16 UTC 2023
3932564589  Linearizability  push   build(deps): bump github/codeql-action from 2.1.37 to 2.1.38                      success  25m23s            N/A        Mon Jan 16 17:11:44 UTC 2023
3932528523  Linearizability  push   build(deps): bump google.golang.org/grpc from 1.51.0 to 1.52.0                    success  25m34s            N/A        Mon Jan 16 17:07:13 UTC 2023
3932526024  Linearizability  push   build(deps): bump github.com/dustin/go-humanize from 1.0.0 to 1.0.1               success  24m50s            N/A        Mon Jan 16 17:06:52 UTC 2023
3932509655  Linearizability  push   build(deps): bump gotest.tools/gotestsum in /tools/mod                            success  24m56s            N/A        Mon Jan 16 17:05:00 UTC 2023
3932508989  Linearizability  push   build(deps): bump github.com/mikefarah/yq/v4 in /tools/mod                        success  24m42s            N/A        Mon Jan 16 17:04:53 UTC 2023
3929603335  Linearizability  push   Merge pull request #15106 from serathius/linearizability-nightly-release          success  25m43s            N/A        Mon Jan 16 10:47:43 UTC 2023
3925799760  Linearizability  push   Merge pull request #15071 from wafuwafu13/refactor-gettoken                       success  N/A               26m14s     Sun Jan 15 23:26:37 UTC 2023
3920743133  Linearizability  push   Merge pull request #15105 from serathius/linearizability-nightly-timeout          success  N/A               26m10s     Sat Jan 14 23:21:46 UTC 2023
3917665954  Linearizability  push   Merge pull request #15101 from serathius/linearizability-history-patch-window     success  N/A               25m54s     Sat Jan 14 08:20:30 UTC 2023
3909555699  Linearizability  push   Merge pull request #15093 from geetasg/lease_support_review_feedback              success  N/A               25m28s     Fri Jan 13 08:16:10 UTC 2023
3908224610  Linearizability  push   Merge pull request #15094 from fuweid/changelog-update                            success  N/A               25m34s     Fri Jan 13 03:49:17 UTC 2023
3906662264  Linearizability  push   Merge pull request #15091 from serathius/linearizability-stop-proxy               success  N/A               26m16s     Thu Jan 12 22:47:12 UTC 2023
3902399088  Linearizability  push   Merge pull request #15084 from serathius/linearizability-lease-timeout            success  N/A               28m28s     Thu Jan 12 12:50:03 UTC 2023
3900752690  Linearizability  push   Merge pull request #15078 from serathius/linearizability-patch-operations         success  N/A               29m22s     Thu Jan 12 09:09:06 UTC 2023
3893794827  Linearizability  push   Merge pull request #15080 from geetasg/lease_support_refactor                     success  N/A               29m59s     Wed Jan 11 14:34:56 UTC 2023
3891730784  Linearizability  push   Merge pull request #15068 from ahrtr/last_wal_rec_corrupt_main_20220107           success  N/A               27m49s     Wed Jan 11 10:07:04 UTC 2023
3891062783  Linearizability  push   Merge pull request #15070 from ahrtr/changelog_wal_record_partially_synced_20230  success  N/A               28m0s      Wed Jan 11 08:38:07 UTC 2023
3885506384  Linearizability  push   Merge pull request #15044 from serathius/linearizability-watch                    success  N/A               29m37s     Tue Jan 10 16:59:10 UTC 2023
3881435488  Linearizability  push   Merge pull request #15046 from serathius/linearizability-failpoints-available     success  N/A               22m1s      Tue Jan 10 08:15:37 UTC 2023
3879547454  Linearizability  push   Merge pull request #15073 from etcd-io/dependabot/go_modules/server/golang.org/x  success  N/A               22m39s     Tue Jan 10 02:14:50 UTC 2023
3879238534  Linearizability  push   tidy up dependencies                                                              success  N/A               23m7s      Tue Jan 10 01:13:09 UTC 2023
3879210053  Linearizability  push   Merge pull request #15074 from etcd-io/dependabot/github_actions/actions/upload-  success  N/A               24m13s     Tue Jan 10 01:08:05 UTC 2023
3879193383  Linearizability  push   Merge pull request #15075 from etcd-io/dependabot/github_actions/actions/checkou  success  N/A               21m54s     Tue Jan 10 01:05:09 UTC 2023
3876075434  Linearizability  push   build(deps): bump actions/checkout from 3.2.0 to 3.3.0                            success  N/A               23m8s      Mon Jan  9 17:12:55 UTC 2023
3876074203  Linearizability  push   build(deps): bump actions/upload-artifact from 3.1.1 to 3.1.2                     success  N/A               23m42s     Mon Jan  9 17:12:48 UTC 2023
3874582102  Linearizability  push   Merge pull request #15059 from serathius/linearizability-operations               success  N/A               22m9s      Mon Jan  9 14:12:27 UTC 2023
3874420966  Linearizability  push   Merge pull request #15048 from serathius/linearizability-network                  success  N/A               22m25s     Mon Jan  9 13:52:01 UTC 2023
3861570770  Linearizability  push   Merge pull request #15067 from halegreen/fix-comment-typo                         success  N/A               18m25s     Sat Jan  7 10:02:46 UTC 2023
3843945339  Linearizability  push   Merge pull request #15033 from chaochn47/snapshotCatchupEntries                   success  N/A               17m51s     Thu Jan  5 04:47:43 UTC 2023
3805686335  Linearizability  push   Merge pull request #15055 from ptabor/20221228-fix-dump-logs-tests                success  N/A               17m37s     Fri Dec 30 08:17:41 UTC 2022
3804333929  Linearizability  push   Merge pull request #15056 from ptabor/20221218-goimport-tests                     success  N/A               17m22s     Fri Dec 30 02:18:16 UTC 2022
3803281539  Linearizability  push   Merge pull request #14924 from serathius/linearizability-multiple-keys            success  N/A               17m1s      Thu Dec 29 21:48:03 UTC 2022
3799935126  Linearizability  push   Merge pull request #15052 from ptabor/20221228-goimports-fix                      success  N/A               16m54s     Thu Dec 29 10:31:38 UTC 2022
3799380964  Linearizability  push   Merge pull request #15053 from wafuwafu13/show-as-learner                         success  N/A               16m44s     Thu Dec 29 08:44:30 UTC 2022
3798903170  Linearizability  push   Merge pull request #15054 from fuweid/cleanup-old-build-tags                      success  N/A               16m14s     Thu Dec 29 07:00:33 UTC 2022
3796026288  Linearizability  push   Merge pull request #15045 from serathius/linearizability-model                    success  N/A               16m56s     Wed Dec 28 19:01:11 UTC 2022
3794229591  Linearizability  push   Merge pull request #15051 from JOJO0527/patch-2                                   success  N/A               16m38s     Wed Dec 28 13:14:26 UTC 2022
3792656075  Linearizability  push   Merge pull request #15050 from ahrtr/fix_rate_limit_20221228                      success  N/A               16m40s     Wed Dec 28 08:15:13 UTC 2022
3783950650  Linearizability  push   Merge pull request #15047 from etcd-io/dependabot/github_actions/ossf/scorecard-  success  N/A               30m10s     Mon Dec 26 23:12:28 UTC 2022
3782563023  Linearizability  push   build(deps): bump ossf/scorecard-action from 2.1.0 to 2.1.2                       success  N/A               17m17s     Mon Dec 26 17:18:55 UTC 2022
3778557346  Linearizability  push   Merge pull request #15021 from ramil600/em_manager                                success  N/A               17m38s     Mon Dec 26 02:19:31 UTC 2022
3766287251  Linearizability  push   Merge pull request #15040 from ahrtr/update_nsswitch                              success  N/A               14m2s      Fri Dec 23 14:21:08 UTC 2022
3765546303  Linearizability  push   Merge pull request #15039 from ahrtr/changelog_remove_busybox_20221223            success  N/A               13m9s      Fri Dec 23 12:13:07 UTC 2022
3764861752  Linearizability  push   Merge pull request #15036 from ahrtr/remove_busybox_20221223                      success  N/A               14m41s     Fri Dec 23 10:09:44 UTC 2022
3747025198  Linearizability  push   changelog: update the release date of etcd v3.4.23                                success  N/A               27m22s     Wed Dec 21 06:37:06 UTC 2022
3744748489  Linearizability  push   Merge pull request #15031 from ahrtr/SnapshotWithVersion_nil_panic_20221220       success  N/A               14m8s      Tue Dec 20 22:45:35 UTC 2022
3744736866  Linearizability  push   Merge pull request #15024 from wafuwafu13/client-isunavailableerr-test            success  N/A               14m57s     Tue Dec 20 22:43:01 UTC 2022
3737620086  Linearizability  push   Merge pull request #15023 from unai-ttxu/task/fix_etcdctl_member_list_hex_consis  success  N/A               14m5s      Tue Dec 20 04:52:53 UTC 2022
3736918148  Linearizability  push   Merge pull request #15012 from wafuwafu13/credentials-test                        success  N/A               13m34s     Tue Dec 20 02:25:22 UTC 2022
3736826432  Linearizability  push   Merge pull request #15026 from etcd-io/dependabot/github_actions/ossf/scorecard-  success  N/A               12m57s     Tue Dec 20 02:08:10 UTC 2022
3736644532  Linearizability  push   build(deps): bump ossf/scorecard-action from 2.0.6 to 2.1.0                       success  N/A               13m40s     Tue Dec 20 01:29:12 UTC 2022
3736633398  Linearizability  push   Merge pull request #15027 from etcd-io/dependabot/github_actions/github/codeql-a  success  N/A               14m3s      Tue Dec 20 01:27:00 UTC 2022
3736627663  Linearizability  push   Merge pull request #15025 from etcd-io/dependabot/go_modules/tools/mod/github.co  success  N/A               14m24s     Tue Dec 20 01:25:44 UTC 2022
3733587372  Linearizability  push   build(deps): bump github/codeql-action from 2.1.36 to 2.1.37                      success  N/A               13m40s     Mon Dec 19 17:09:23 UTC 2022
3733587190  Linearizability  push   build(deps): bump ossf/scorecard-action from 2.0.6 to 2.1.0                       success  N/A               13m32s     Mon Dec 19 17:09:22 UTC 2022
3733540402  Linearizability  push   build(deps): bump github.com/mikefarah/yq/v4 in /tools/mod                        success  N/A               13m27s     Mon Dec 19 17:04:08 UTC 2022
3731324339  Linearizability  push   Merge pull request #15020 from ahrtr/changelog_security_20221219                  success  N/A               13m46s     Mon Dec 19 12:18:04 UTC 2022
3730138842  Linearizability  push   Merge pull request #14993 from ramil600/add-log                                   success  N/A               13m11s     Mon Dec 19 09:30:54 UTC 2022
3729704702  Linearizability  push   Merge pull request #14875 from BarrensZeppelin/patch-lessor                       success  N/A               14m1s      Mon Dec 19 08:28:54 UTC 2022
3722089345  Linearizability  push   Merge pull request #14918 from fuweid/revert-pr-12129                             success  N/A               14m10s     Sat Dec 17 22:38:05 UTC 2022
3718841974  Linearizability  push   Merge pull request #14998 from chaochn47/auth_test_split                          success  N/A               14m44s     Sat Dec 17 06:23:50 UTC 2022
3710926167  Linearizability  push   Merge pull request #14999 from fuweid/deflake-14926                               success  N/A               13m52s     Fri Dec 16 06:42:37 UTC 2022
3710629284  Linearizability  push   Merge pull request #15005 from ArkaSaha30/trivy-nightly-scan                      success  N/A               13m57s     Fri Dec 16 05:41:50 UTC 2022
3710107846  Linearizability  push   Merge pull request #15004 from j13tw/fix-doc                                      success  N/A               13m50s     Fri Dec 16 03:48:15 UTC 2022
3709637997  Linearizability  push   Merge pull request #15000 from wafuwafu13/server-access-control-test              success  N/A               14m15s     Fri Dec 16 02:10:07 UTC 2022

fuweid avatar Jan 30 '23 15:01 fuweid

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar May 21 '23 17:05 stale[bot]

@serathius Is the issue still valid after #15078 #15242?

@fuweid feel free to pick this up, I've spent some time last year to move porcupine to generics in: https://github.com/tjungblu/porcupine

That way you can skip the json serde and without the casting it's a bit quicker in general.

Moving porcupine to generics seems another area to reduce the runtime.

chaochn47 avatar Jun 14 '23 00:06 chaochn47

Issue is still valid, however I think we need to change the approach. Profiling done by me looks incorrect. Don't think rewrite to generics will help. We can make the code faster, but I don't think it will reduce the test runtime. I think we lack proper instrumentation of tests to measure runtime and the contributing factors.

I'm thinking about collecting reports about how much each tests took. Similar to what we do for flakiness measurement. The test should output junit report, that we can later aggregate over hundreds of runs, and calculate some statistics instead of eyeballing the useless Github action page.

serathius avatar Jun 14 '23 07:06 serathius

Thanks @serathius! I think it's a good idea to first narrow down if a single test case takes the majority of the runtime or it's a universe pattern in robustness test. After that, we might use profiling to identify which code path is unexpectedly slow. Created an issue with background to track this ^

chaochn47 avatar Jun 14 '23 16:06 chaochn47

Don't think rewrite to generics will help. We can make the code faster, but I don't think it will reduce the test runtime.

It won't help that much, IIRC the difference was about 20% at best. There was a trade-off between copying the state at every step (json serde) and doing a plain memcopy of a struct (plus the casting overheads).

Especially if the culprit is slow GH action CPUs. +1 on the instrumentation though. I'm still a big fan of the pebble benchmark page: https://cockroachdb.github.io/pebble/?max=local

Maybe we can come up with something similar, obviously for test suite runtimes first. Even though I expect significant noise/variance from the action runners.

tjungblu avatar Jun 14 '23 17:06 tjungblu

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 17 '23 23:09 stale[bot]