substrate
substrate copied to clipboard
Benchmark `System::set_code`
Currently we hard-code weights for System::set_code and System::set_code_without_checks:
#[pallet::call_index(2)]
#[pallet::weight((T::BlockWeights::get().max_block, DispatchClass::Operational))]
pub fn set_code(origin: OriginFor<T>, code: Vec<u8>) -> DispatchResultWithPostInfo {
This is inflexible and always results in un-schedulable runtime upgrades. We should benchmark them instead.
cc @gavofyork
@shawntabrizi said that he had issues back in the days when he tried it, but he didn't really remembered them.
Can I try it? So we need a codesize-dependent benchmark here?
Can I try it? So we need a codesize-dependent benchmark here?
Yes.
@shawntabrizi left this comment. I doubt that I can set up a benchmark if even Shawn failed.
But the comment is from 04/2020, maybe something changed and it's easier now.
You should try some different code sizes of runtimes, but it should not change the behavior of the benchmark. We don't iterate the wasm code or something like this, so the size should not change the time it takes to execute the function. For sure, there is for sure some impact, but I don't think it is measurable or really changes the numbers.
it is pretty hard to come up with a real Wasm runtime to test the upgrade with
The set_code parses the version and then puts it into storage. Can you not mock that and use a byte-block with a prepended RuntimeVersion?
The
set_codeparses the version and then puts it into storage
It is not doing that :P It is reading the version to compare it against the current version. The LastRuntimeUpgrade version is changed when the new runtime runs the first time, so not really important for this function. In the past we still needed to instantiate the runtime to fetch the version, but now we put the runtime version into a section of the wasm binary and reading this should be almost constant time (the only thing here being different is the decompression time of the wasm binary)
And said decompression time, is it dependent on the content? So if I use a vec with constant data, does it have a different runtime characteristic than a vec with random data?
I would propose you just collect some runtimes. We have 2 runtimes in Substrate itself, 4 in Polkadot and even more in Cumulus. If you compile all of them and use them for the benchmarking you are probably good. Maybe you could also use Acala&/Moonbeam as I remember that they had relative big runtimes.
So if I use a vec with constant data, does it have a different runtime characteristic than a vec with random data?
Probably, no compression expert. :D
I don't understand. I should compile and run the benchmarks on my system and just add my determined weight-fn for set_code?
Or should I collect the runtimes somewhere (commit the blobs to git 🤔 ) and let the benchmarks run on CI hardware like a normal benchmark.
Is it really possible to create a good linear regression model with these few datapoints?
Or should I collect the runtimes somewhere (commit the blobs to git thinking ) and let the benchmarks run on CI hardware like a normal benchmark.
This one.
Is it really possible to create a good linear regression model with these few datapoints?
I meant this as some sort of starting point to get a "feeling" on how the size of the data influences the weight. Later we can still think of adding some fake data to the wasm file or something like that to increase the size.
Ok, what would be a good place in the repo for the runtimes.
How about including them in the binary with include_bytes! when compiling with the runtime-benchmarks feature?
How about including them in the binary with
include_bytes!when compiling with theruntime-benchmarksfeature?
Sounds good.
Ok, what would be a good place in the repo for the runtimes.
Somewhere in the frame-system crate, but we first should find out if we really need by doing some experiments. We can later still come up with a good place for them.
I ran into a problem. When I want to use a prebuilt binary for the benchmark, I get the error:
Input("Benchmark frame_system::set_code failed: InvalidSpecName")
This error originates from Pallet::can_set_code where the spec_name and spec_version are checked.
Is there a tool to manipulate these values in the binary files? If not, would it make sense to develop such a tool?
If you just test them one-by-one, wouldn't it be enough to change your runtime version to match it in the mock.rs? https://github.com/paritytech/substrate/blob/3e71d606b7d1e91d9c1701c0a443530eefca1a39/frame/system/src/mock.rs#L49
Or disable the check when runtime-benchmarks feature flag is set.
If you just test them one-by-one, wouldn't it be enough to change your runtime version to match it in the mock.rs?
https://github.com/paritytech/substrate/blob/3e71d606b7d1e91d9c1701c0a443530eefca1a39/frame/system/src/mock.rs#L49
In the tests it seems to be system-test but when I run the benchmarks from the substrate binary, the impl_name is substrate-node.
Or disable the check when
runtime-benchmarksfeature flag is set.
Which falsifies the benchmarks slightly bc. of skipped code.
If you run the benchmarks in the node you would have to change it in the node runtime then: https://github.com/paritytech/substrate/blob/2dbf62591fd35725c6e39f8ddfd27c3695dab594/bin/node/runtime/src/lib.rs#L128
Is it possible to change that in runtime? Isn't it a compiled-in constant.
Is it possible to change that in runtime? Isn't it a compiled-in constant.
Why do you need to change it in the runtime? I thought you were doing one-off testing with different WASM blobs.
Is it possible to change that in runtime? Isn't it a compiled-in constant.
Why do you need to change it in the runtime? I thought you were doing one-off testing with different WASM blobs.
Ah, I missunderstood you. You meant I should change it for crafting my test wasm runtimes?
One issue is that the names differ dependent if you run the benchmarks or the benchmark-tests, so one will fail.
I think the "slowest part" of set_code is the calling into the runtime to get the runtime version of the new wasm blob. As long as we do this, we can probably disable the checks when the runtime-benchmarks feature is enabled.
I think the "slowest part" of
set_codeis the calling into the runtime to get the runtime version of the new wasm blob. As long as we do this, we can probably disable the checks when theruntime-benchmarksfeature is enabled.
But isn't this a possible footgun? If someone compiles production code with the runtime-benchmarks feature, he accidently disables these validity checks.
This check is just there to prevent human errors, not a crucial check. People can also use set_storage or set_code_without_checks to circumvent these checks.
Now I have an array of runtime-blobs which I iterate like this in the benchmark setup fn.
let i in 0 .. (RUNTIMES.len()-1) as u32;
But i is not the correct complexity-parameter for the weight function, it's RUNTIMES[i].len()
Is it possible to iterate and remap i ?
The first manually evaluated results (on my weak notebook).
Repeat: 20 Steps: 50
| runtime | size(kb) | mean µs | sigma µs | % |
|---|---|---|---|---|
| kitchensink | 4152 | 65970 | 556.5 | 0.8% |
| kusama | 1376 | 43110 | 155.1 | 0.3% |
| polkadot | 1232 | 38140 | 528.7 | 1.3% |
| westend | 1132 | 35970 | 55.72 | 0.1% |
| rococo | 1120 | 37060 | 74.91 | 0.2% |
| moonbeam | 1252 | 43510 | 66.85 | 0.1% |
| moonriver | 1304 | 45930 | 61.84 | 0.1% |
| moonbase | 1332 | 46950 | 35.26 | 0.0% |

Okay thanks for the results. 65ms is really fast as compared to the 1.5 secs that we currently estimate.
If the difference between a small and a large runtime is just 40%, then we can probably use the large one and dont benchmark with a component.
It is also very fortunate that the kitchensink is the largest, so we dont need to pull in external deps.
But i is not the correct complexity-parameter for the weight function, it's RUNTIMES[i].len()
Hm, no its not to my knowledge… but you could map i to the runtime with the closest size.
So I'd change my bench to just use the kitchensink runtime and we assume that's the maximum size?
BTW: Wouldn't it be nice to have more control over the components than just using a range of values. How about adding an additional iterator interface to manually define sample points? Or add a mapping function to iterate and remap sample points?
So if I only benchmark the kitchensink, I could also take the wasm-binary from the target dir and we don't have to include an extra runtime just for testing.
What's the canonical way to find the filesystem-path for the runtime? Is there some function to query it or should I use cargo environment variables?