massa Network restart follow-up issue 2

This issue aims to track the remaining tasks related to Network restart capabilities.

[ ] Optimise the downtime interpolation of cycles. If the network is down between cycles 20 and 520, we currently interpolate all 500 missing cycles and dismissing 490 immediatly. If we only interpolate the last 10 cycles, the selector panics. In massa-final-state > src > final_state.rs:

    // TODO: Bring back the following optimisation (it fails because of selector)
// Then, build all the completed cycles in betweens. If we have to build more cycles than the cycle_history_length, we only build the last ones.
    // let current_slot_cycle = (current_slot_cycle + 1)
    //    .max(end_slot_cycle.saturating_sub(self.config.pos_config.cycle_history_length as u64));
    let current_slot_cycle = current_slot_cycle + 1;

[ ] Ensure receiving a non-valid db after bootstraping does not lead to a panic. Instead, we should restart bootstrap from scratch. In massa-node > src > main.rs:
```
if !final_state.read().is_db_valid() {
	// TODO: Bootstrap again instead of panicking
	panic!("critical: db is not valid after bootstrap");
}
```

[x] Remove every usage of hash XOR-ing. #4137 In massa-hash > src > hash.rs:

// Previously, the final state hash was a XOR of various hashses.
// However, this is vulnerable: https://github.com/massalabs/massa/discussions/3852
// As a result, we use lsmtree's Sparse Merkle Tree instead, which is not vulnerable to this.
// We still use bitwise XOR for fingerprinting on some structures.
// TODO: Remove every usage if this?
mpl BitXorAssign for Hash {

[x] Implement a MassaDBController trait to avoid messy dependencies: https://github.com/massalabs/massa/issues/4046
- [x] Create a massa-db-exports crate defining the trait, used everywhere
- [x] Rename massa-db to massa-db-worker
- [x] Every other crate should only import massa-db-exports
- [x] We also need functions to read in rocks_db exposed in the controller trait (expose get_cf, get_iterator): this will let us eliminate the need to have rocksdb as a dependency everytime we want to read to the DB.
[x] Change the constant MAX_BOOTSTRAPPED_NEW_ELEMENTS from 1000 to 500 to avoid reaching the size limit of a Bootstrap message.
[x] Clean bootstrap cursor's behaviour. E.g. StreamingStep::Finished should not keep a handle on the last_key. Instead, don't filter the changes if we are finished. It should simplify the match.
[ ] Limit the number of updates we send during Bootstrap. We keep sending all the changes, but if they are too big, we should send an error to the client so that they can bootstrap again from scratch.
[x] #4064

May 29 '23 19:05 Leo-Besancon

In this issue the only thing that is left and critical is : Ensure receiving a non-valid db after bootstraping does not lead to a panic. Instead, we should restart bootstrap from scratch. right ?

Sep 19 '23 09:09 AurelienFT

@AurelienFT

In this issue the only thing that is left and critical is : Ensure receiving a non-valid db after bootstraping does not lead to a panic. Instead, we should restart bootstrap from scratch. right ?

Depending on the Bootstrap tests made by @litchipi, we may want to ensure that the following is not an issue: "Limit the number of updates we send during Bootstrap. We keep sending all the changes, but if they are too big, we should send an error to the client so that they can bootstrap again from scratch." We don't want to redesign the bootstrap of the changes, it would be too complicated, but we should ensure no security risk is presented by this.

Sep 19 '23 09:09 Leo-Besancon

@Leo-Besancon Do you think there is still something relevant here ? If it's the case maybe we should make a separate issue and close this big one.

Mar 28 '24 14:03 AurelienFT

@Leo-Besancon Do you think there is still something relevant here ? If it's the case maybe we should make a separate issue and close this big one.

@AurelienFT For me all the tasks left are relevant yes:

downtime interpolation optimization
bootstraping again instead of panic on bad db check
limiting the number of updates

However, they certainly are low prio tasks.

Apr 02 '24 08:04 Leo-Besancon

massa massa copied to clipboard

Network restart follow-up issue 2

massa
massa copied to clipboard