wasmtime-dotnet
wasmtime-dotnet copied to clipboard
Use generated generic `Linker.DefineFunction()` overloads for efficiently invoking callbacks
Hi! Currently, callbacks defined with Linker.DefineFunction()
and Function.FromCallback()
are called using reflection, which has some overhead:
https://github.com/bytecodealliance/wasmtime-dotnet/blob/f3a383a7b4391b8a653af521499de96f50b975ce/src/Function.cs#L2421-L2423
With this PR, T4 text templates are used to automatically generate generic Linker.DefineFunction()
and Function.FromCallback()
overloads for different combinations of parameter count, result count, and the state of having a Caller
argument.
Based on the idea from @martindevans in https://github.com/bytecodealliance/wasmtime-dotnet/pull/160#issuecomment-1273516711, the template will generate code that uses ValueBox.Converter<T>
for each generic parameter and result type, and then directly invoke the callback, which avois using reflection and a number of heap allocations (e.g. allocating the arguments array, and boxing the arguments and return values), thereby improving performance and contributing to https://github.com/bytecodealliance/wasmtime-dotnet/issues/113.
This way, no dynamic code generation will be needed (see previous PR #160).
Resolving the correct overload by the C# compiler seems to work well:
Currently, overloads will be generated for up to 12 parameters and 4 result values, which will generate 2 * 13 * 5 = 130 overloads. Note that more than 200 overloads will cause the C# compiler to fail resolving them correctly (tested in VS 17.4.0 Preview 2.1).
For delegate types not covered by the overloads (e.g. a Action
or Func
with more parameter/result types, or for other delegate types), or if the delegate type isn't known at compile-time, reflection will still be used by providing a overload that takes a Delegate
(that one is still defined in Linker.cs
and Function.cs
).
Currently, the generated files (Linker.DefineFunction.cs
, Function.FromCallback.cs
) are part of the repo as there doesn't seem to be an easy way to automatically generate the file on each build, without using additional tools. You can either generate the file in Visual Studio (by saving the .tt
file), or by using the dotnet-t4
tool:
dotnet tool install --global dotnet-t4 --version 2.3.0
t4 src/Linker.DefineFunction.tt -o src/Linker.DefineFunction.cs
t4 src/Function.FromCallback.tt -o src/Function.FromCallback.cs
The performance improvements are in the same area of the previous PR (#160) that used DynamicMethod
to dynamically generate code to invoke the callback:
The most performance boost occurs when defining a function with a single parameter. When testing with the following code with .NET 7.0.0-rc.1 on Windows 10 Version 21H2 x64, using an Action<int>
:
using var config = new Config();
config.WithOptimizationLevel(OptimizationLevel.Speed);
using var engine = new Engine(config);
using var module = Module.FromText(
engine,
"hello",
@"
(module
(func $hello (import """" ""hello"") (param i32))
(func (export ""run"")
(local $0 i32)
loop $for-loop|0
local.get $0
i32.const 2000000
i32.lt_s
if
local.get $0
call $hello
local.get $0
i32.const 1
i32.add
local.set $0
br $for-loop|0
end
end
)
)
");
using var linker = new Linker(engine);
using var store = new Store(engine);
int calls = 0;
linker.DefineFunction(
"",
"hello",
(int x) =>
{
calls++;
}
);
var instance = linker.Instantiate(store, module);
var run = instance.GetAction("run")!;
var sw = new Stopwatch();
for (int i = 0; i < 5; i++)
{
sw.Restart();
run();
sw.Stop();
Console.WriteLine("Elapsed: " + sw.Elapsed);
}
Before the change, the times are listed as follows (when compiling for Release
):
Elapsed: 00:00:00.3099829
Elapsed: 00:00:00.4299516
Elapsed: 00:00:00.4231544
Elapsed: 00:00:00.4376052
Elapsed: 00:00:00.4250763
After the change:
Elapsed: 00:00:00.1333811
Elapsed: 00:00:00.1264500
Elapsed: 00:00:00.1125693
Elapsed: 00:00:00.1105346
Elapsed: 00:00:00.1083284
However, when using more than one arguments, the time with reflection suddenly decreases, and the performance gain is much less. For example, using a Action<int, float, long>
:
using var config = new Config();
config.WithOptimizationLevel(OptimizationLevel.Speed);
using var engine = new Engine(config);
using var module = Module.FromText(
engine,
"hello",
@"
(module
(func $hello (import """" ""hello"") (param i32 f32 i64))
(func (export ""run"")
(local $0 i32)
loop $for-loop|0
local.get $0
i32.const 2000000
i32.lt_s
if
local.get $0
f32.const 123.456
i64.const 1234567890
call $hello
local.get $0
i32.const 1
i32.add
local.set $0
br $for-loop|0
end
end
)
)
");
using var linker = new Linker(engine);
using var store = new Store(engine);
int calls = 0;
linker.DefineFunction(
"",
"hello",
(int x, float y, long z) =>
{
calls++;
}
);
var instance = linker.Instantiate(store, module);
var run = instance.GetAction("run")!;
var sw = new Stopwatch();
for (int i = 0; i < 5; i++)
{
sw.Restart();
run();
sw.Stop();
Console.WriteLine("Elapsed: " + sw.Elapsed);
}
Before the change:
Elapsed: 00:00:00.3110580
Elapsed: 00:00:00.2320351
Elapsed: 00:00:00.2336036
Elapsed: 00:00:00.2323470
Elapsed: 00:00:00.2347170
After the change:
Elapsed: 00:00:00.2289068
Elapsed: 00:00:00.2005959
Elapsed: 00:00:00.2001471
Elapsed: 00:00:00.1991165
Elapsed: 00:00:00.1986648
Testing with a Func<int, float, long, ValueTuple<int, int, long>>
:
using var config = new Config();
config.WithOptimizationLevel(OptimizationLevel.Speed);
using var engine = new Engine(config);
using var module = Module.FromText(
engine,
"hello",
@"
(module
(func $hello (import """" ""hello"") (param i32 f32 i64) (result i32 i32 i64))
(func (export ""run"")
(local $0 i32)
loop $for-loop|0
local.get $0
i32.const 2000000
i32.lt_s
if
local.get $0
f32.const 123.456
i64.const 1234567890
call $hello
drop
drop
drop
local.get $0
i32.const 1
i32.add
local.set $0
br $for-loop|0
end
end
)
)
");
using var linker = new Linker(engine);
using var store = new Store(engine);
int calls = 0;
linker.DefineFunction(
"",
"hello",
(int x, float y, long z) =>
{
calls++;
return (1, 2, 3L);
}
);
var instance = linker.Instantiate(store, module);
var run = instance.GetAction("run")!;
var sw = new Stopwatch();
for (int i = 0; i < 5; i++)
{
sw.Restart();
run();
sw.Stop();
Console.WriteLine("Elapsed: " + sw.Elapsed);
}
Before:
Elapsed: 00:00:00.6005550
Elapsed: 00:00:00.5286508
Elapsed: 00:00:00.5350325
Elapsed: 00:00:00.5246926
Elapsed: 00:00:00.5371668
After:
Elapsed: 00:00:00.4286678
Elapsed: 00:00:00.3949199
Elapsed: 00:00:00.3934236
Elapsed: 00:00:00.3999471
Elapsed: 00:00:00.3924870
TODOs:
- After merging #161, the generated code will need to be adjusted to take the
returnsTuple
value into account when checking that the overload that was generated without implicitly using aValueTuple
must not useValueTuple
as generic type parameter.
Comparison to other approaches:
- Compared to generating dynamic code (see #160), this will also work on .NET Runtimes that don't support dynamic code (or would interpret it). Additionally, this will avoid the small runtime cost of code generation when defining the function. However, this approach only works if the delegate is known at compile-time to be a
Func<...>
/Action<...>
with the maximum number of type parameters; otherwise, reflection will still be used to call it. (Note that thisFunc
/Action
type restriction is already the case today.) - Compared to source generators, this approach is independent from the language used to compile to IL (e.g. C#, F#, VB.NET etc.).
What do you think?
Thanks!
It seems we can improve performance even more by using unchecked function variants (https://github.com/bytecodealliance/wasmtime/pull/3350). With commit https://github.com/kpreisser/wasmtime-dotnet/commit/09560a090f404eeeb1afa3f884f13bc97eb79f19 (separate branch generic-linker-define-function-unchecked
that is based on this PR), I added a ValueRaw
struct (that maps to wasmtime_val_raw_t
) and an IValueRawConverter<T>
interface similar to the existing ones, which are then used by the unchecked callback functions.
Since the .NET side knows the exact parameter and result types used by the function, this should be safe. (However I'm not very familiar with wasmtime
s resource management, so I'm not sure if have done the externref
and funcref
management right in the converters.)
When trying the above benchmarks again (under .NET 7.0.0-rc.2, Windows 10 Version 21H2 x64), I get the following results:
Benchmark 1 (Action<int>
):
Without this PR:
Elapsed: 00:00:00.2786417
Elapsed: 00:00:00.4104838
Elapsed: 00:00:00.4092130
Elapsed: 00:00:00.4186214
Elapsed: 00:00:00.4116475
With this PR:
Elapsed: 00:00:00.1474995
Elapsed: 00:00:00.1077812
Elapsed: 00:00:00.1064291
Elapsed: 00:00:00.1127987
Elapsed: 00:00:00.1071312
With this PR + commit https://github.com/kpreisser/wasmtime-dotnet/commit/09560a090f404eeeb1afa3f884f13bc97eb79f19:
Elapsed: 00:00:00.0673361
Elapsed: 00:00:00.0612401
Elapsed: 00:00:00.0603837
Elapsed: 00:00:00.0458639
Elapsed: 00:00:00.0466120
Benchmark 2 (Action<int, float, long>
):
Without this PR:
Elapsed: 00:00:00.3057047
Elapsed: 00:00:00.2382728
Elapsed: 00:00:00.2427044
Elapsed: 00:00:00.2426993
Elapsed: 00:00:00.2464523
With this PR:
Elapsed: 00:00:00.2293002
Elapsed: 00:00:00.2010061
Elapsed: 00:00:00.2026756
Elapsed: 00:00:00.2042638
Elapsed: 00:00:00.2037475
With this PR + commit https://github.com/kpreisser/wasmtime-dotnet/commit/09560a090f404eeeb1afa3f884f13bc97eb79f19:
Elapsed: 00:00:00.0803906
Elapsed: 00:00:00.0766781
Elapsed: 00:00:00.0751499
Elapsed: 00:00:00.0688984
Elapsed: 00:00:00.0541781
Benchmark 3 (Func<int, float, long, ValueTuple<int, int, long>>
):
Without this PR:
Elapsed: 00:00:00.5830719
Elapsed: 00:00:00.5139658
Elapsed: 00:00:00.5282328
Elapsed: 00:00:00.5148367
Elapsed: 00:00:00.5155885
With this PR:
Elapsed: 00:00:00.4300748
Elapsed: 00:00:00.3947863
Elapsed: 00:00:00.3907995
Elapsed: 00:00:00.3930814
Elapsed: 00:00:00.3916519
With this PR + commit https://github.com/kpreisser/wasmtime-dotnet/commit/09560a090f404eeeb1afa3f884f13bc97eb79f19:
Elapsed: 00:00:00.1369848
Elapsed: 00:00:00.1291589
Elapsed: 00:00:00.0874956
Elapsed: 00:00:00.0789046
Elapsed: 00:00:00.0782615
With the last benchmark, this seems to be roughly a 6x improvement comparing to the current state (without this PR).
What do you think?
Thanks!
I think the PR is now ready for review. The use of unchecked function variants could be done in a separate follow-up PR.
A downside with the overloads is that if you have a Func<...>
accepting at most 12 values but returning more than 4 values, for example Func<ValueTuple<int, int, int, int, int>>
:
In that case, the overload Function.FromCallback<int, ValueTuple<int, int, int, int, int>>
would be resolved (because the generated overloads only include ValueTuple
with up to 4 type parameters), which will throw an exception due to the use of the tuple.
Instead, you would need to explicitly cast the delegate to Delegate
to make it work (see commit https://github.com/bytecodealliance/wasmtime-dotnet/pull/163/commits/dbd89455ebd94ff426888e018a1368cefda28dc6 for an example), as in that case the reflection variant is used.
Edit: I implemented a solution by falling back to using reflection (instead of throwing an exception) in such a case.
Thank you!
@kpreisser this looks like really excellent work! I'll try to get this reviewed early next week.
@kpreisser I apologize for the delay in reviewing this PR. I have time to dive into it fully tomorrow.
I'll review the recent changes and I'll likely approve, but let's hold off on merging this until CI goes green again with the other PR.
@kpreisser thanks for all the hard work on this!