aptos-core [api] Add REST API to modify failpoints during execution

Description

Adding failpoints REST API (/failpoints?name=str&actions=str), to allow modifying failpoints during execution. This will allow more targeted/surgical testing - both within smoke tests as well as forge/cluster tests

Test Plan

Used in new smoke tests in the stacked diffs.

This change is

Jul 18 '22 15:07 ikabiljo

As far as I know, @banool is working on generating OpenAPI spec from code. This might get such endpoints populated in the spec and confuse developers.

Do we have other ways to config this on the fly besides relying on the Rest API?

Jul 18 '22 16:07 jjleng

I think having this as part of public api is confusing too

Jul 18 '22 16:07 zekun000

I feel like it makes more sense to configure this via runtime arguments (the config / CLI). Do you expressly need to modify failpoints during a single process lifetime?

Jul 18 '22 17:07 banool

I am fine to change this to a different alternative.

But I do need to change it multiple times, dynamically, during process execution. Looking at smoke test setup I only have REST API as the client to interact with it :) so open to alternatives, if anyone has a suggestion?

I can also make this fully behind failpoints flag, so with that - it shouldn't affect generating public docs?

Jul 18 '22 17:07 ikabiljo

With the API changes I'm making we can conditionally include certain groups of endpoints in the API. This should definitely be an opt in feature, we don't want it to be possible for random people to make a node's endpoints start failing. But I'm not quite ready with that stuff yet. How urgently do you need this feature?

Jul 18 '22 17:07 banool

@banool - how about this ? if you don't compile with failpoints feature, no new endpoints will be added?

Jul 18 '22 20:07 ikabiljo

We don't really configure much via features today for the API, all configuration is done via a config file. You can imagine that having different permutations of features for our node would result in an explosion in the number of docker images we build (though we could of course just not do that). Either way, I think having another flag in our config file would be more standard.

Jul 18 '22 20:07 banool

So to confirm, you are fine with conditionally including this into REST API?

failpoints library requires features enabled, so from that perspective - even if there was config, if it wasn't compiled with appropriate feature - the rest of the code wouldn't work. So I would leave it as is, and if me move from failpoints to something custom, change from feature to config to enable it. sounds good?

Jul 19 '22 03:07 ikabiljo

Yeah my stance is:

Configuring via the API is okay given you need to be able to change these failpoints within a single process lifetime.
The endpoints should only be included if an opt-in flag is set in the config.

This second point is possible with the new API I'm making, not sure about the previous one (but probably also).

Jul 19 '22 04:07 banool

Agreed with Igor. The API crate already supports the failpoints feature, we need to include the endpoint only when failpoints feature is enabled. Once this is in place, the opt-in flag is not needed for the endpoint, I guess.

Jul 19 '22 16:07 jjleng

Okay fair enough, we can do that.

Jul 19 '22 17:07 banool

I'm with @banool, I think features are great until you realize you accidentally deployed a binary with a feature enabled and the only way to disable that feature is by providing a new binary. I'm good with this PR if we can make exposing this configurable via a config file and not just a compilation flag.

Jul 24 '22 05:07 davidiw

Wow a lot of discussion. Yes, this has to be configured by config and not API, otherwise you could have some unfortunate mistakes without a way to quickly fix it

Jul 24 '22 16:07 gregnazario

That's not quite what the latest discussion has been about, we've been discussing whether to enable the failpoints API at all via either a feature or a config flag.

I think we agree that dynamically setting failpoints with the API is reasonable.

Jul 24 '22 22:07 banool

Added node_config.api.failpoints_enabled that is false by default, that needs to be enabled to allow setting failpoints through that endpoint. Let me know if you have any more concerns.

Jul 28 '22 01:07 ikabiljo

:white_check_mark: Forge test success on `b108e7421ec67cb38d4011640497c757fde2570f`

all up : 4844 TPS, 2282 ms latency, 3000 ms p99 latency,no expired txns

Jul 31 '22 17:07 github-actions[bot]

Heads up: I'll add this to the new API too. Should be transparent from your side.

Jul 31 '22 18:07 banool

aptos-core aptos-core copied to clipboard

[api] Add REST API to modify failpoints during execution

Description

Test Plan

:white_check_mark: Forge test success on b108e7421ec67cb38d4011640497c757fde2570f

aptos-core
aptos-core copied to clipboard

:white_check_mark: Forge test success on `b108e7421ec67cb38d4011640497c757fde2570f`