tastora framework refactor proposal
I would separate the test cases tastora would need to test for into a few categories:
api tests
- these should purely test API and ensure it's not breaking against what the tests expect
- this means using the go
rpcclientand also raw curl against the node's RPC port - these do not need a complex setup - they actually only need a BN and a LN - where the RPC requests can target both BN and LN and ensure they're both compliant.
e2e - sanity
- these test a specific flow of functionality; e.g. DASing, network upgrade (V4 -> V5)
- some header sync tests fall here too
- These tests should only require the most basic setup - 1 BN 1 LN
- app upgrade tests fall here
e2e - p2p sanity
- these tests should sanity check that a basic network topology of nodes can sync, DAS, submit/retrieve blobs —> I think one test can cover this
TestBasicNetworkOperationor something like that that can contain several cases inside to check for health metrics (local head of nodes, sampling stats, etc).- discovery tests fall here
- more complicated header sync tests also fall here
- The topology should be a bit more complex - 2 BNs 5 LNs (something like that) connected acyclically, ensuring all nodes are reaching the desired network height, sampling successfully, and can at least retrieve blobs ( don’t worry about submission here )
- tests like basic reconstruction, bootstrapping would fall into this category
e2e - p2p complex / routing
- these tests define more complicated scenarios where there are several nodes on the network and some are faulty and that healthy nodes can still fetch data / operate
- Topology should be more complex here (as per what tests need)
- Tests like
TestShrexNDFromLightsWithBadFulls,TestFullReconstructFromLightsand any tests that test forarchivalrequest routing would be included here (as they’re more complicated network scenarios) - These tests often require more complicated setups and are cold-path / edge case scenarios that we need to ensure node is protected against
- I would say most of the fraud tests also land in this category
framework
- the framework should always have a single
BNthat is the minimum viable DA network (can keep it as pointer on the framework struct) and acts as the main entrypoint to the network for DA blocks/headersSetupNetworkshould always set up the main BN as no DA tests can function without one
- can also have a separate field for additional BNs that are needed for other tests with more complex topology requirements
- a separate field for all LNs (as 1 or several may be required for tests)
- currently the api for creating / getting BNs and LNs is a bit all over the place; please separate into
- New (should include implicit start) — we may need to rethink this with certain topology / more complicated tests but let’s implicit start inside of New for now
- Get
- All funding endpoints should be private to
frameworkand should not need to be touched outside of setting up blob submission (to fill up network)- let’s just hide / unify most of this functionality until we define a concrete set of test cases to test blob submission from DA nodes
- I will do this when I do the multi-core endpoint tests
- let’s just hide / unify most of this functionality until we define a concrete set of test cases to test blob submission from DA nodes
- generally, remove all unused methods on framework
Agree with direction and idea to separate tests from APi to e2e. Hard to reference particular items, so just list of random thoughts on what can be added:
- Reconstruction tests are out of questing for now
- API auth / disable auth tests
- cors tests (lower prio)
- Different transport protocol tests. We need to ensure each protocol works:
- webtransport
- webrtc
- quic
- tcp
- One/multiple bootsrapper being unavailable test
- LN <-> LN headers propagation test (One light node needs to be disconnected from all BN)
- Discovery: BN should be discoverable via DHT. Would need multiple BN and 1 LN to observe
- Node restart scenarios(BN, LN).
- Fresh start might be different from restart, worth to test nodes restoring state / continuing properly after restart.
- Force quit should not corrupt data.
- Pruning tests. Less of a priority, but still worth to think about.
Generally, I like the idea of this proposal. From my side, I'd add some extra ideas to consider:
- backwards compatibility. This is an essential test during migration(when do not expect any breaking changes). It could be done optionally and turn on/off when needed.
- some stress tests - e.g the network under a massive blob submission.
this means using the go rpcclient and also raw curl against the node's RPC port
comparing the results for equality is a good idea, but I'd also compare the result with open-rpc doc to ensure that docs are up to date.
@gupadhyaya is it okay if this issue supersedes #4439 for now? I will define the most urgent tests in order of prio here and then we can take it from there.
The most urgent for now is the framework refactor proposal + changing the tests in blob_test to reflect it.
Parking this here:
https://github.com/celestiaorg/celestia-app/blob/main/test/docker-e2e/e2e_upgrade_test.go#L21 we should somehow “import” this and make sure a BN can sync through an upgrade
expanding the proposal: https://github.com/celestiaorg/celestia-node/issues/4489