ref-fvm
ref-fvm copied to clipboard
[Investigation] Reported performance regression when updating wasmtime from 19 to 24.
We've seen a reported performance regression that appears to be associated with the wasmtime update that happened in FVM 4.4 (wasmtime 19 -> 24). This was an interesting performance regression because:
- It shows up even when updating FVM 2.7 to 2.9 even when wasmtime 2.9 isn't being used, just included. This indicates that the issue is in some common wasmtime dependency.
- None of the gas benchmarks in FVM v4 are picking up the regression, but those benchmarks mostly focus on non-wasm things. Importantly, I'm reasonably sure that:
- It's not related to hashing.
- It's not related to signature verification.
- It's not related to calling into wasm contracts (see #2057).
- It's not related to trapping and running out of gas (we have an explicit benchmark for this because it ocasionally rears its head and turns into an $N^2$ problem).
- It shouldn't be cranelift. That tends to be version-locked with a specific wasmtime version.
Things I haven't checked:
- The cost of wasm instructions in general. We priced wasm instructions long ago and it was a bit of a manual process.
- The cost of jumping in and out of wasm. Wasmtime has made some recent changes here however, I'd expect any issues there to be specific to the wasmtime version in-use (observation (1) above appears to conflict with this).
All I can think of now is... rustix? Or something like that?
My next step is to try to update FVM v2 in forest to see what deps get updated along with it.
We've confirmed that having wasmtime v24 anywhere in the build tree, even if unused, causes this slowdown.
- It's not the
write_corefeature inobjectas far as I can tell. - I'm also seeing some changes in gimli, but v24 uses a different version of that so it shouldn't matter.
Interesting, I'm seeing smallvec now depending on serde. I can't imagine how that might be relevant, but I'll need to check.
I've tracked it down to wasmtime's cranelift feature (does not appear to be feature unification but I may be missing something). I'm now trying with just wasmtime-cranelift.
- wasmtime-environ doesn't reproduce.
- cranelift-wasm does.
cranelift-codegen reproduces, trying cranelift-control now
cranelift-control does not. Trying cranelift-codegen-meta, cranelift-codegen-shared, and cranelift-isle
Ok, it is cranelift-codegen specifically. Now I'm trying to see if it's because we're not supporting some architecture by building with CARGO_FEATURE_ALL_ARCH.... we'll see if that even works.
Er, trying with reduced features first. Only "std" and "unwind" because those are required.
Ok, still broken with those features. Now I'm trying with "all arch" to see if it's some kind if isle issue (I think it is, but I'm not sure if that's the fix).
Enabling all architectures doesn't help, and I can't skip the ISLE build (that option only exists if it's pre-built). Now I'm bisecting cranelift-codegen, starting with 0.109.
Ok, it's cranelift-codegen 0.107.0 exactly. Now I'm testing a build with all the FVM crates fully updated to get us all on a single wasmtime version just in case it's an issue with multiple wasmtime versions.
Ok, it is feature unification. Specifically adding the trace-log feature to regalloc2.
My hypothesis is that this is actually just a compilation slowdown, not an execution slowdown. We're only running through ~24 epochs here and will have to pause to lazily (IIRC?) compile actors every time we load a new one. This is my guess because I'm noticing that epochs 1 & 4 are taking a while.
Additionally, if we're speeding through multiple network upgrades (especially if we're switching FVM versions), we'll have to re-compile the actors per network version.
@Stebalien This could be verified by running a longer benchmark without network upgrades. The .env. file would need to be modified along the lines of:
LOTUS_IMAGE=ghcr.io/chainsafe/lotus-devnet:2024-10-10-600728e
FOREST_DATA_DIR=/forest_data
LOTUS_DATA_DIR=/lotus_data
FIL_PROOFS_PARAMETER_CACHE=/var/tmp/filecoin-proof-parameters
MINER_ACTOR_ADDRESS=f01000
LOTUS_RPC_PORT=1234
LOTUS_P2P_PORT=1235
MINER_RPC_PORT=2345
FOREST_RPC_PORT=3456
FOREST_OFFLINE_RPC_PORT=3457
F3_RPC_PORT=23456
F3_FINALITY=100000
GENESIS_NETWORK_VERSION=24
SHARK_HEIGHT=-10
HYGGE_HEIGHT=-9
LIGHTNING_HEIGHT=-8
THUNDER_HEIGHT=-7
WATERMELON_HEIGHT=-6
DRAGON_HEIGHT=-5
WAFFLE_HEIGHT=-4
TUKTUK_HEIGHT=-3
TARGET_HEIGHT=200
Note that the timeout would also need to be extended. Then, we compare the timings before and after the FVM upgrade.
In that way, we will be able to confirm if the slowdown is coming from re-compiling actors (which is not a big issue).
@Stebalien I believe your hypothesis holds. I tried on the config above locally (on a machine that previously reported 50% slowdown).
FVM 4.3.1
________________________________________________________
Executed in 781.81 secs fish external
usr time 5.92 secs 0.00 micros 5.92 secs
sys time 5.95 secs 696.00 micros 5.95 secs
FVM 4.4
________________________________________________________
Executed in 782.58 secs fish external
usr time 5.81 secs 0.00 micros 5.81 secs
sys time 6.01 secs 811.00 micros 6.01 secs
So I guess it's "okay-ish" in a sense that the slowdown only occurs under some synthetic conditions. That said, it might make sense to report it to the cranelift maintainers.
Yep, I plan on reporting it upstream. I'm also looking into possibly disabling the offending feature (requires upstream help, but I'm guessing it's only needed for GC which we don't use).
Also, for forest, you can modify the "quick" compilation profile to optimize cranelift-codegen (or maybe just regalloc2). That didn't get it to be quite fast enough to pass the test in the 5m timeout, but it definitely improved the situation.
I'm also going to write a quick "load/compile the actors" benchmark.
Oh, lol, no. This trace-log is literally just for debugging. I don't think this was supposed to get shipped to production.
I don't think this was supposed to get shipped to production.
Nvm:
# Note that this just enables `trace-log` for `clif-util` and doesn't turn it on
# for all of Cranelift, which would be bad.
It looks like they thought this wouldn't have a performance impact.
Ah, that comment is saying that trace-log is only enabled for clif-util.
And... already fixed but not in the version we're using: https://github.com/bytecodealliance/wasmtime/pull/9128. Time to update to v25, I guess.