celestia-node icon indicating copy to clipboard operation
celestia-node copied to clipboard

tastora framework refactor proposal

Open renaynay opened this issue 4 months ago • 5 comments

I would separate the test cases tastora would need to test for into a few categories:

api tests

  • these should purely test API and ensure it's not breaking against what the tests expect
  • this means using the go rpcclient and also raw curl against the node's RPC port
  • these do not need a complex setup - they actually only need a BN and a LN - where the RPC requests can target both BN and LN and ensure they're both compliant.

e2e - sanity

  • these test a specific flow of functionality; e.g. DASing, network upgrade (V4 -> V5)
    • some header sync tests fall here too
  • These tests should only require the most basic setup - 1 BN 1 LN
  • app upgrade tests fall here

e2e - p2p sanity

  • these tests should sanity check that a basic network topology of nodes can sync, DAS, submit/retrieve blobs —> I think one test can cover this TestBasicNetworkOperation or something like that that can contain several cases inside to check for health metrics (local head of nodes, sampling stats, etc).
    • discovery tests fall here
    • more complicated header sync tests also fall here
  • The topology should be a bit more complex - 2 BNs 5 LNs (something like that) connected acyclically, ensuring all nodes are reaching the desired network height, sampling successfully, and can at least retrieve blobs ( don’t worry about submission here )
  • tests like basic reconstruction, bootstrapping would fall into this category

e2e - p2p complex / routing

  • these tests define more complicated scenarios where there are several nodes on the network and some are faulty and that healthy nodes can still fetch data / operate
  • Topology should be more complex here (as per what tests need)
  • Tests like TestShrexNDFromLightsWithBadFulls , TestFullReconstructFromLights and any tests that test for archival request routing would be included here (as they’re more complicated network scenarios)
  • These tests often require more complicated setups and are cold-path / edge case scenarios that we need to ensure node is protected against
  • I would say most of the fraud tests also land in this category

framework

  • the framework should always have a single BN that is the minimum viable DA network (can keep it as pointer on the framework struct) and acts as the main entrypoint to the network for DA blocks/headers
    • SetupNetwork should always set up the main BN as no DA tests can function without one
  • can also have a separate field for additional BNs that are needed for other tests with more complex topology requirements
  • a separate field for all LNs (as 1 or several may be required for tests)
  • currently the api for creating / getting BNs and LNs is a bit all over the place; please separate into
    • New (should include implicit start) — we may need to rethink this with certain topology / more complicated tests but let’s implicit start inside of New for now
    • Get
  • All funding endpoints should be private to framework and should not need to be touched outside of setting up blob submission (to fill up network)
    • let’s just hide / unify most of this functionality until we define a concrete set of test cases to test blob submission from DA nodes
      • I will do this when I do the multi-core endpoint tests
  • generally, remove all unused methods on framework

renaynay avatar Aug 01 '25 10:08 renaynay

Agree with direction and idea to separate tests from APi to e2e. Hard to reference particular items, so just list of random thoughts on what can be added:

  • Reconstruction tests are out of questing for now
  • API auth / disable auth tests
    • cors tests (lower prio)
  • Different transport protocol tests. We need to ensure each protocol works:
    • webtransport
    • webrtc
    • quic
    • tcp
  • One/multiple bootsrapper being unavailable test
  • LN <-> LN headers propagation test (One light node needs to be disconnected from all BN)
  • Discovery: BN should be discoverable via DHT. Would need multiple BN and 1 LN to observe
  • Node restart scenarios(BN, LN).
    • Fresh start might be different from restart, worth to test nodes restoring state / continuing properly after restart.
    • Force quit should not corrupt data.
  • Pruning tests. Less of a priority, but still worth to think about.

walldiss avatar Aug 04 '25 14:08 walldiss

Generally, I like the idea of this proposal. From my side, I'd add some extra ideas to consider:

  • backwards compatibility. This is an essential test during migration(when do not expect any breaking changes). It could be done optionally and turn on/off when needed.
  • some stress tests - e.g the network under a massive blob submission.

this means using the go rpcclient and also raw curl against the node's RPC port

comparing the results for equality is a good idea, but I'd also compare the result with open-rpc doc to ensure that docs are up to date.

vgonkivs avatar Aug 04 '25 14:08 vgonkivs

@gupadhyaya is it okay if this issue supersedes #4439 for now? I will define the most urgent tests in order of prio here and then we can take it from there.

The most urgent for now is the framework refactor proposal + changing the tests in blob_test to reflect it.

renaynay avatar Aug 05 '25 09:08 renaynay

Parking this here:

https://github.com/celestiaorg/celestia-app/blob/main/test/docker-e2e/e2e_upgrade_test.go#L21 we should somehow “import” this and make sure a BN can sync through an upgrade

renaynay avatar Aug 05 '25 13:08 renaynay

expanding the proposal: https://github.com/celestiaorg/celestia-node/issues/4489

gupadhyaya avatar Aug 18 '25 08:08 gupadhyaya