test/passes/65536_locals_for_liveness.wasm is suuuuper slow
To the point where I haven't been patient enough for it to finish locally and I've temporarily deleted it to unblock running the rest of the test suite. Did something change with coalesce-locals recently?
Just as a drive-by, maybe the change in https://github.com/WebAssembly/binaryen/pull/4567 is related? Presumably before this change the test wasn't actually running.
EDIT: although clearly I've fallen into a time-hole because I didn't realise it's been nearly a month since this change was made. Sorry for the noise.
Yes, it seems likely it is that PR. Running say ./auto_update_tests.py wasm-opt pauses on it for a very long time which is indeed very noticeable...
cc @juj
I wonder if it would make sense to remove that test, since it is now testing an arbitrary limit that no longer exists? Originally the issue was large memory usage, but now it does look like the optimization pass is not fast enough to operate on a >64K locals case in a timely manner.
For end users that might have such many locals: if running the coalesce-locals pass for 64K locals is unrealistically slow, it is very likely also unrealistically slow for, say, 63K locals, so reinstating a limit on 64K does not seem based on actual data. Maybe there should be some kind of cmdline tuneable option that would enable users to choose a ceiling to disable that pass at? And by default disable liveness already at something like 8K or 32K or wherever a realistic perf. limit would happen to fall at?
@juj out of curiosity, given the original (very sensible) motivation for the change, is the 5.5M LOC example behaving better now that it's in place? Presumably it involves more than 64K locals.
Yes, perhaps the test is excessive, to check 64K. I agree 63K would likely also be a problem.
We can at least switch the test from 64K to 8K, since 8K is where we start to use a sparse matrix, so that's the minimum for us to get testing of sparsity. I opened https://github.com/WebAssembly/binaryen/pull/4635 for that now.
However, even that testcase takes 4 seconds to run - and it's doing basically nothing, it's the simplest case possible with 8K locals - so probably we should rewrite the algorithm to actually benefit from sparsity, something like store a small set of interferences for each local, instead of doing a matrix lookup of every possible pair of it and something else.
(edit: profiling, all the time is spent in hashing)
@juj out of curiosity, given the original (very sensible) motivation for the change, is the https://github.com/WebAssembly/binaryen/pull/4567#issuecomment-1086068895 behaving better now that it's in place? Presumably it involves more than 64K locals.
I did test that the change resolved the build warnings that the user was having, but like mentioned in the thread, I was unable to reproduce the fault that the user had reported. We are in the process of getting an updated Unity version with the fix in to the customer to be able to test, but that will take some weeks to get through the release cycles.