joystream
joystream copied to clipboard
Runtime Parameter Modelling v2
Background
I started to work tackling this issue
https://github.com/Joystream/joystream/issues/2914
the motivation was to build a model as described in Background
. I started easy, doing it in a spreadsheet, but I quickly ran into issues. The main issue is that you have to start representing a large number of values and concepts from the pallets, runtime and node state (genesis config) in the spreadsheet model. Everything from constants, derived configured values, weight functions and even internal implementation details about how how parameters impact the operation of the system. The problem is not only chasing down and unpacking all of these values for the model to be complete, but that it generates a totally unmanageable synchronisation problem. Whenever either Substrate itself, our configurations, weights or constants change, you have to make sure to update that in the model. That is simply a disaster waiting to happen.
My next step was to try to consider whether we could try to make a Rust based tool which would import the GenesisConfig
and node, and then be able to read off critical values, e.g. like extrinsic weights, and automatically have an up to date model of how the system worked. This surely would have been better, but the problem with this approach was still that it would have to leak a range of assumptions about how such values actually impacted the behaviour of the running system, e.g. the dynamic fee adjustment or extrinsic class specific weight limits, and thus would have the same, albeit milder, problem as before. It would also make it harder for a broader set of people to play around with the model, as it's relatively hard to work in the Rust codebase.
Next I tried to consider doing everything as our integration tests, with a single validator node, which would have the benefit of not having to replicate the on-chain business logic in any way for simulation purposes. The difficulty here became that in order to run the simulation with time realistic parameters, for time periods that span many days or weeks, it would result in compressing the real processing time needed for a "virtual block" too low. Both creating and validating and producing blocks on a single machine would likely not work smoothly, even if we used manual or instant seal. Also synchronizing the actions of the model agents and the node progress would introduce its own overhead and complexity. Lastly, it would by definitely also preclude any real multi-validator dynamics, as it would have to be a single validator in any case.
Proposal
Model the behaviour of the blockchain and network by running a network while focused on two things
- Execution of scenarios that are parametrised, in a no-code way, with real world stochastic, ecologically valid, utilisation assumptions and integrity constraint assumptions.
- Models are time continuous, simulated in discrete steps with adjustable time step resolution, encompassing behaviour of all agents anda minimal model of the processing logic of the chain itself within the model, reflecting whatever areas are being investigated.
- All chain configuration, state variables, fees and other values of interest should be read from a running node, and imported into the simulation, to minimise manual synchronisation efforts over time.
- Collect data based on key metrics of interest, and export in convenient format for external analysis.
In this approach, a model is just an integrated and consistent story of how people will attempt to interact with the system as a whole. This applies all possible actors, including members, creators, storage providers, etc. It can include both pro-social and anti-social or abusive behaviour, such as denial of service attacks. Of primary importance is the implications of such actions for blockchain utilisation specifically, but a broader model can include off-chain services/nodes as well.
The model is stochastic in the sense that the actor behaviour is not deterministic, so for example, the inflow of new memberships being created follows some stochastic process (e.g. an inhomogenous Poisson process), and when executing a scenario during simulation, this process is sampled to determine what actions the actor should make. This is key to discovering how well the system behaves under a very diverse range of circumstances, without having to exhaustively enumerate them all. Also, as servicing certain usage patterns may be very expensive in a variety of ways, having the ability to quantify how likely such cases are, and exactly how poorly the system behaves, will allow us to make more informed tradeoffs. As a result of such stochasticity, a given scenario has to be executed multiple times, so that statistics can be collected for key metrics.
Lastly, behavioural assumptions try to have some ecological validity, meaning that it attempts to capture some important feature of real world behaviour. The by far most important example constraint would be that most people only use the system when they are awake, and where awake people are has a strong periodic pattern over a 24 hour period. This property causes critical temporal spiking patterns in utilisation which are important to capture.