go-libp2p
go-libp2p copied to clipboard
speed up CI
CI is taking a long time to run our builds. We should
- split up our builds. There's no need to run
go testandgo test -racesequentially. These can be separate jobs. - use self-hosted runners running on beefier hardware, and without the concurrency restrictions that GitHub Actions imposes
I can't assign myself or the ipdx tag yet, but we're looking into this,
I created a PR to show how we could run the test in separate jobs: https://github.com/libp2p/go-libp2p/pull/1639
We need to investigate some more and make sure this would be an improvement:
- If we're waiting for 12 runners instead of 6, we lose the benefits of parallelization, and the CI would be slower,
- Self-hosted runners takes time to schedule and start, so this might not be the right fix either.
(cc @galargh)
We should also see if there are any test cases that are particularly slow. This is not a replacement for optimizing our CI setup, but it might help nonetheless.
I found this tutorial in the Google cache: https://webcache.googleusercontent.com/search?q=cache:5NYzDwHIEj8J:https://leighmcculloch.com/posts/go-find-slow-tests/&cd=3&hl=en&ct=clnk&gl=il
Step 1: Run the tests with verbose json output, without caching.
go test -count=1 -v -json | tee testoutput.txt
Step 2: Extract from that output a sorted list of tests and their elapsed times.
cat testoutput.txt \
| jq -r 'select(.Action == "pass" and .Test != null) | .Test + "," + (.Elapsed | tostring)' \
| sort -k2 -n -t,
These are all the tests taking longer than 0.5s (unfortunately, without the package names). The number behind the test name is the duration in seconds.
TestConstraintsCleanup,0.5
TestResolutionMisaligned,0.51
TestTcpTransport/github.com/libp2p/go-libp2p/p2p/transport/testsuite.SubtestStreamOpenStress#01,0.56
TestFuzzManyPeers,0.57
TestListenerCloseClosesQueued,0.6
TestTcpTransport/github.com/libp2p/go-libp2p/p2p/transport/testsuite.SubtestStreamOpenStress,0.61
TestFailuresOnResponder,0.64
TestInMemoryPeerstore/Metadata,0.67
TestInMemoryPeerstore/Metadata/putting_and_getting,0.67
TestDsPeerstore/Leveldb/Metadata,0.72
TestDsPeerstore/Leveldb/Metadata/putting_and_getting,0.72
TestBackoffConnector,0.74
TestStressActiveDial,0.89
TestAutoNATPublic,1
TestDeadlines,1
TestDialWorkerLoopConcurrentFailure,1
TestDialWorkerLoopFailure,1
TestHostAddrChangeDetection,1
TestLocalIPChangesWhenListenAddrChanges,1
TestAutoNATPrivate,1.01
TestIdentifyDeltaWhileIdentifyingConn,1.01
TestDefaultTransportWithMemoryLimit/github.com/libp2p/go-libp2p/p2p/muxer/testsuite.SubtestStreamOpenStress,1.06
TestDefaultTransportWithMemoryLimit,1.46
TestStreamsWithLatency,1.5
TestDefaultTransport/github.com/libp2p/go-libp2p/p2p/muxer/testsuite.SubtestStreamOpenStress,1.57
TestObsAddrSet,1.57
TestConnectionGating,1.59
TestTcpTransport,1.82
TestDsPeerstore/Leveldb/BasicPeerstore,1.91
TestDefaultTransport/github.com/libp2p/go-libp2p/p2p/muxer/testsuite.SubtestStreamOpenStress,1.95
TestDialWorkerLoopConcurrentFailureStress,2
TestRelayLimitTime,2.01
TestAutoNATPublictoPrivate,2.02
TestDefaultTransport,2.02
TestObservedAddrFiltering,2.05
TestStatelessReset,2.13
TestInMemoryPeerstore/BasicPeerstore,2.14
TestDefaultTransport,2.41
TestDsKeyBook/Leveldb/PeersWithKeys,2.47
TestInMemoryKeyBook/PeersWithKeys,2.49
TestDsPeerstore/Leveldb,2.71
TestDsPeerstore,2.72
TestLimitedStreams,2.73
TestInMemoryKeyBook,2.81
TestInMemoryPeerstore,2.91
TestStressLimiter,2.99
TestQuickBurstRespectsSilencePeriod,3
TestDsKeyBook,3.06
TestDsKeyBook/Leveldb,3.06
I just discovered https://github.com/pl-strflt/tf-aws-gh-runner.
@galargh What would need to be done to use these runners for go-libp2p? It looks like it would clash with Unified CI, since the runs-on is hardcoded.
I just discovered https://github.com/pl-strflt/tf-aws-gh-runner. @galargh What would need to be done to use these runners for go-libp2p? It looks like it would clash with Unified CI, since the
runs-onis hardcoded.
Let's make them configurable! https://github.com/protocol/.github/pull/443
GitHub just announced configuration variables which, in my opinion, are a pretty good fit for this.
Any chance we can pick this up at some point? Slow CI probably slowed me down by 1-2h while cutting the v0.27.0 release. The most important thing would be to dramatically increase horizontal scaling. When merging multiple PRs, jobs get queued up for a very long time, since no runners are available.
I just put ubuntu jobs from Go Test workflow on self-hosted GHA runner. Self-hosted runners are sourced from a separate pool which should free up the resources for other jobs. Additionally, it speeds up the execution of the job ~13m -> ~6m.
This change will only affect the runs initiated in this repository (i.e. workflow runs triggered on pull_request from forks will keep running on GH machines).
If we happen to have to revert the change, here are the instructions:
- Go to https://github.com/libp2p/go-libp2p/settings/variables/actions
- Delete
UCI_GO_TEST_RUNNER_UBUNTURepository variables
Next steps from the IPDX side, prepare windows runners for use in Go Test workflow.
Thank you @galargh, this is great!
Is the UCI_GO_TEST_RUNNER_UBUNTU format documented anywhere? I'm struggling making sense of it:

Good catch, we didn't mention the format in https://github.com/protocol/.github#configuration-variables before. Now I added it in https://github.com/protocol/.github/commit/43e27921600f05276a1259a9d9d67cdb9ec4e905.
UCI_*_RUNNER_*variables expect the values to be JSON formatted. For example, if you want theMacOSrunner used inGo Testworkflow to bemacos-12specifically, you'd setUCI_GO_TEST_RUNNER_MACOSto"macos-12"(notice the"around the string); and if you want yourUbunturunner to be a self hosted machine with labelsthis,is,my,self-hosted,runner, you'd setUCI_GO_TEST_RUNNER_UBUNTUto["this", "is", "my", "self-hosted", "runner"].
What are these labels? Each label (except for the "go-libp2p") makes sense on its own, but I don't understand how they compose.
We use https://github.com/philips-labs/terraform-aws-github-runner to set up our self-hosted runners. It gives the runners os, architecture and self-hosted labels by default.
Additionally, we label our self-hosted runners with a runner group they belong to, see https://github.com/pl-strflt/tf-aws-gh-runner/blob/main/runners.tf#L150. In this case, the runner group we're using is go-libp2p defined at https://github.com/pl-strflt/tf-aws-gh-runner/blob/main/runners.tf#L80
As you can see by looking at the available Runner Types, we've been grouping the runners by repository so far but as the solution gains popularity, I'm thinking it'd make for an easier set up to group the runner types by sizes and only break out of this scheme if some repository really needs a dedicated pool of runners.
I added size-based groups to our self-hosted solution. The labels in this repo now read: ['self-hosted', 'linux', 'x64', '2xlarge'].