go-libp2p speed up CI

CI is taking a long time to run our builds. We should

split up our builds. There's no need to run go test and go test -race sequentially. These can be separate jobs.
use self-hosted runners running on beefier hardware, and without the concurrency restrictions that GitHub Actions imposes

Jul 01 '22 18:07 marten-seemann

I can't assign myself or the ipdx tag yet, but we're looking into this,

I created a PR to show how we could run the test in separate jobs: https://github.com/libp2p/go-libp2p/pull/1639

We need to investigate some more and make sure this would be an improvement:

If we're waiting for 12 runners instead of 6, we lose the benefits of parallelization, and the CI would be slower,
Self-hosted runners takes time to schedule and start, so this might not be the right fix either.

(cc @galargh)

Jul 05 '22 15:07 laurentsenta

We should also see if there are any test cases that are particularly slow. This is not a replacement for optimizing our CI setup, but it might help nonetheless.

I found this tutorial in the Google cache: https://webcache.googleusercontent.com/search?q=cache:5NYzDwHIEj8J:https://leighmcculloch.com/posts/go-find-slow-tests/&cd=3&hl=en&ct=clnk&gl=il

Step 1: Run the tests with verbose json output, without caching.

go test -count=1 -v -json | tee testoutput.txt

Step 2: Extract from that output a sorted list of tests and their elapsed times.

cat testoutput.txt \
  | jq -r 'select(.Action == "pass" and .Test != null) | .Test + "," + (.Elapsed | tostring)' \
  | sort -k2 -n -t,

Aug 15 '22 10:08 marten-seemann

These are all the tests taking longer than 0.5s (unfortunately, without the package names). The number behind the test name is the duration in seconds.

TestConstraintsCleanup,0.5
TestResolutionMisaligned,0.51
TestTcpTransport/github.com/libp2p/go-libp2p/p2p/transport/testsuite.SubtestStreamOpenStress#01,0.56
TestFuzzManyPeers,0.57
TestListenerCloseClosesQueued,0.6
TestTcpTransport/github.com/libp2p/go-libp2p/p2p/transport/testsuite.SubtestStreamOpenStress,0.61
TestFailuresOnResponder,0.64
TestInMemoryPeerstore/Metadata,0.67
TestInMemoryPeerstore/Metadata/putting_and_getting,0.67
TestDsPeerstore/Leveldb/Metadata,0.72
TestDsPeerstore/Leveldb/Metadata/putting_and_getting,0.72
TestBackoffConnector,0.74
TestStressActiveDial,0.89
TestAutoNATPublic,1
TestDeadlines,1
TestDialWorkerLoopConcurrentFailure,1
TestDialWorkerLoopFailure,1
TestHostAddrChangeDetection,1
TestLocalIPChangesWhenListenAddrChanges,1
TestAutoNATPrivate,1.01
TestIdentifyDeltaWhileIdentifyingConn,1.01
TestDefaultTransportWithMemoryLimit/github.com/libp2p/go-libp2p/p2p/muxer/testsuite.SubtestStreamOpenStress,1.06
TestDefaultTransportWithMemoryLimit,1.46
TestStreamsWithLatency,1.5
TestDefaultTransport/github.com/libp2p/go-libp2p/p2p/muxer/testsuite.SubtestStreamOpenStress,1.57
TestObsAddrSet,1.57
TestConnectionGating,1.59
TestTcpTransport,1.82
TestDsPeerstore/Leveldb/BasicPeerstore,1.91
TestDefaultTransport/github.com/libp2p/go-libp2p/p2p/muxer/testsuite.SubtestStreamOpenStress,1.95
TestDialWorkerLoopConcurrentFailureStress,2
TestRelayLimitTime,2.01
TestAutoNATPublictoPrivate,2.02
TestDefaultTransport,2.02
TestObservedAddrFiltering,2.05
TestStatelessReset,2.13
TestInMemoryPeerstore/BasicPeerstore,2.14
TestDefaultTransport,2.41
TestDsKeyBook/Leveldb/PeersWithKeys,2.47
TestInMemoryKeyBook/PeersWithKeys,2.49
TestDsPeerstore/Leveldb,2.71
TestDsPeerstore,2.72
TestLimitedStreams,2.73
TestInMemoryKeyBook,2.81
TestInMemoryPeerstore,2.91
TestStressLimiter,2.99
TestQuickBurstRespectsSilencePeriod,3
TestDsKeyBook,3.06
TestDsKeyBook/Leveldb,3.06

Aug 15 '22 10:08 marten-seemann

I just discovered https://github.com/pl-strflt/tf-aws-gh-runner. @galargh What would need to be done to use these runners for go-libp2p? It looks like it would clash with Unified CI, since the runs-on is hardcoded.

Jan 13 '23 03:01 marten-seemann

I just discovered https://github.com/pl-strflt/tf-aws-gh-runner. @galargh What would need to be done to use these runners for go-libp2p? It looks like it would clash with Unified CI, since the runs-on is hardcoded.

Let's make them configurable! https://github.com/protocol/.github/pull/443

GitHub just announced configuration variables which, in my opinion, are a pretty good fit for this.

Jan 13 '23 08:01 galargh

Any chance we can pick this up at some point? Slow CI probably slowed me down by 1-2h while cutting the v0.27.0 release. The most important thing would be to dramatically increase horizontal scaling. When merging multiple PRs, jobs get queued up for a very long time, since no runners are available.

Apr 06 '23 08:04 marten-seemann

I just put ubuntu jobs from Go Test workflow on self-hosted GHA runner. Self-hosted runners are sourced from a separate pool which should free up the resources for other jobs. Additionally, it speeds up the execution of the job ~13m -> ~6m.

This change will only affect the runs initiated in this repository (i.e. workflow runs triggered on pull_request from forks will keep running on GH machines).

If we happen to have to revert the change, here are the instructions:

Go to https://github.com/libp2p/go-libp2p/settings/variables/actions
Delete UCI_GO_TEST_RUNNER_UBUNTU Repository variables

Next steps from the IPDX side, prepare windows runners for use in Go Test workflow.

Apr 06 '23 10:04 galargh

Thank you @galargh, this is great!

Is the UCI_GO_TEST_RUNNER_UBUNTU format documented anywhere? I'm struggling making sense of it:

Apr 07 '23 03:04 marten-seemann

Good catch, we didn't mention the format in https://github.com/protocol/.github#configuration-variables before. Now I added it in https://github.com/protocol/.github/commit/43e27921600f05276a1259a9d9d67cdb9ec4e905.

UCI_*_RUNNER_* variables expect the values to be JSON formatted. For example, if you want the MacOS runner used in Go Test workflow to be macos-12 specifically, you'd set UCI_GO_TEST_RUNNER_MACOS to "macos-12" (notice the " around the string); and if you want your Ubuntu runner to be a self hosted machine with labels this, is, my, self-hosted, runner, you'd set UCI_GO_TEST_RUNNER_UBUNTU to ["this", "is", "my", "self-hosted", "runner"].

Apr 07 '23 07:04 galargh

What are these labels? Each label (except for the "go-libp2p") makes sense on its own, but I don't understand how they compose.

Apr 07 '23 07:04 marten-seemann

We use https://github.com/philips-labs/terraform-aws-github-runner to set up our self-hosted runners. It gives the runners os, architecture and self-hosted labels by default.

Additionally, we label our self-hosted runners with a runner group they belong to, see https://github.com/pl-strflt/tf-aws-gh-runner/blob/main/runners.tf#L150. In this case, the runner group we're using is go-libp2p defined at https://github.com/pl-strflt/tf-aws-gh-runner/blob/main/runners.tf#L80

As you can see by looking at the available Runner Types, we've been grouping the runners by repository so far but as the solution gains popularity, I'm thinking it'd make for an easier set up to group the runner types by sizes and only break out of this scheme if some repository really needs a dedicated pool of runners.

Apr 07 '23 07:04 galargh

I added size-based groups to our self-hosted solution. The labels in this repo now read: ['self-hosted', 'linux', 'x64', '2xlarge'].

Apr 07 '23 09:04 galargh

go-libp2p go-libp2p copied to clipboard

speed up CI

go-libp2p
go-libp2p copied to clipboard