holochain-rust icon indicating copy to clipboard operation
holochain-rust copied to clipboard

Instream drop on demand

Open willemolding opened this issue 6 years ago • 1 comments

PR summary

To be able to test the behavior of various tests a way to artificially induce failures at different layers of the network stack is required.

We propose using env vars which are picked up inside the in_stream crate to drop connections, messages or packets depending on the test requirement. The currently implementation only drops ws connections by ignoring incoming and outgoing messages for the duration of an error burst.

As this crate is to be used by both the sim2h worker and sim2h server we can simulate failures at either end with no extra work.

Dropping ws Connections

The env var WS_FAILURE_MODEL is a tuple (S, MTBF, MFD) where MTBF is the mean time between failures in ms, MFD is the mean failure duration in ms and $S$ is a seed for the random number generator to ensure repeatability.

The failure model follows a telegraph process which switched between being in an error state (burst) or an ok state. The time between errors and the error burst duration are exponentially distributed random variables independent of each other.

It should be possible to run any of the existing tests with this env var set to see how they operate under unreliable network conditions

Example:

WS_FAILURE_MODEL=(42, 2000, 100) hc-app-spec-test-sim2h

Would run the app spec tests with the client experiencing an error burst on average ever 2 seconds for an average of 100ms.

WS_FAILURE_MODEL=(42, 2000, 100) sim2h_server

Would run the sim2h server with the same failure rates

It is also integrated with the trycp server so calling the RPC method spawn with a failureModel object will set the env vars internally e.g.:

await ws.call('spawn', {"id": "my-player-failure", "failureModel": {"seed": 42, "MTBF": 100, "MFD": 100}})

followups

( any new tickets/concerns that were discovered or created during this work but aren't in scope for review here )

changelog

  • [ ] if this is a code change that effects some consumer (e.g. zome developers) of holochain core, then it has been added to our between-release changelog with the format
- summary of change [PR#1234](https://github.com/holochain/holochain-rust/pull/1234)

documentation

willemolding avatar Dec 13 '19 01:12 willemolding

As per stand-up today, from an activating from testing point of view, adding being able to specify the values from trycp_server "spawn" command is needed. Will review as soon as that's ready.

zippy avatar Dec 16 '19 19:12 zippy