Integration test from snapshot
The integration tests that run in epoch 3.0 take a significant amount of time just to boot to the initial state for the test. Anecdotal measurements show a test with 2 miners takes about 140s to boot to epoch 3.0, then the actual test only takes about 16s to complete. These numbers are on a powerful local machine, and are likely to be much longer on the CI runners. My theory is that we could speed this up by loading initial state, where the Bitcoin network and the Stacks network are already advanced into epoch 3.0.
adding idea from call: during the nextest cache creation, also create a chainstate archive that may be shared amongst all tests. the caveat is that a lot of tests will need to be rewritten due to expected balances etc.
another challenge is how to replicate this locally
the caveat is that a lot of tests will need to be rewritten due to expected balances etc.
Agreed, some re-writing will be necessary regardless of what we do. The snapshot could start with one account with a huge balance, then the first step in spinning up the test is moving initial balances to the accounts specified by the test.
Here a wrap-up of the last naka-sync. Basically 3 strategies to deal with bitcoind and related tests has been identified:
- bitcoind for End-2-End testing where we may need the "real thing"
- Snapshot. Even this approach can be used for End-2-End testing when we want to setup the tests at specific chain state
- bitcoind mock. For pure integration testing purpose
An highlight about the 3 strategies:
Bitcoind
This the current approach used in our testing. Even if it is good to reproduce real scenario, this make our tests "heavy" to run in different ways:
- booting a toolchain from scratch, waiting to arrive a the right state before executing the relevant part of a test is both time consuming and resource intensive. (NOTE: this effect could depends by the bitcoind to be response, but also to the stack-core to re-process the toolchain state)
- tests based on bitcoind, cannot be run with proper parallelism (or no pararallelism at all), due to the network setup required or the need to spawn multiple bitcoind processes
Snapshot
Creating a chain snapshot, could be useful, even working with bitcoind, to speed-up the boot time for the tests. Basically the test, can start doing the relevant things just rightaway. Here the friction could be related on how to produce, manage and make those snapshots availables to the tests (and eventually adapt test as mentioned in the comments above)
One approach has been proposed (to avoid storing them somewhere) would be to generate those snapshots when building tests.
The gain can be perceived when more tests at once are run (mainly on CI). Instead, the process can be slow considering the case when on a dev machine you need to run a specific test multiple times. (Eventually this could be mitigated with some kind of caching?!).
Bitcoind Mock
Stubbing bitcoind seems to be also a possibility, and this could give different advantages:
- re-produce/force sneaky cases, because of the complete control over the bitcoin chain.
- test parallelism: using the mock we should be able to make tests independent by each other, giving the chance to run the in parallel
- fast execution (anyhow to be considered the eventual latency given by the stack-core processing, for test simulating the toolchain boot from scratch)
Anyhow an evaluation of the real feasibility need to be done (and eventually related solution to be implemented)
Findings
In general these 3 strategies shouldn't be mutually exclusive, but can be used all together depending on the test case:
- real bitcoind: is still relevant when we want to run real E2E tests. Eventually if we succeed covering stacks-core functionalities we the other strategies, probably we could thing to reduce the number of this kind of tests or limit them to very specific scenarios.
- snapshot: can help to improve performance on bitcoind strategy. An efficient way to manage those snapshots need to be found to make them easy to create, update, and load for the releavent tests.
- bitcoind mock. This can help us to write more integration tests, possibly deterministically producing any kind of test scenario we could think of, forcing corner cases for instance. Also enabling tests to be run in parallel.
A variant for the real Bitcoind strategy could be to containerize the deamon and spawn it in the tests using testcointainers crate.
This, combined with a proper nextest configuration, could allow us to address the parallelism issue.
NOTE: This approach has been adopted on sBTC. Here the relevant PR: https://github.com/stacks-sbtc/sbtc/pull/1572
First prototype of snapshotting is about to be deployed: https://github.com/stacks-network/stacks-core/pull/6163
Proposal for next steps
(short term):
- Apply the snapshotting to a first selection of integration tests
- Improve the current approach to increase reliability
- Automatically Include a bunch of UTXOs to a set of addresses (with simple hardcoded keys)
(mid term)
- Try to use the "group" snapshots to be reused by multiple tests (reduce disk space)
- Try to generate the snapshots before tests run