RobustToolbox
RobustToolbox copied to clipboard
Matrix3x2 and Matrix SIMD
Marked as a draft, because this isn't so much a PR as it is just me sharing my benchmark code and trying to make sense of the results. Also, disclaimer: All I know about SIMD & C# Unsafe code I've learnt in the last week, so there might be rookie mistakes.
There are two main benchmarks to look at, firstly VectorTransformBenchmark which applies a sequence of three matrix transforms to a vector, using existing (non-simd) code, some new simd functions, and an equivalent using System.Numerics. equivalents. On my machine, the result of that is:
Triple Vector Transform Results
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
AMD Ryzen 7 3800X, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=6.0.300
[Host] : .NET Core 6.0.5 (CoreCLR 6.0.522.21309, CoreFX 6.0.522.21309), X64 RyuJIT
DefaultJob : .NET Core 6.0.5 (CoreCLR 6.0.522.21309, CoreFX 6.0.522.21309), X64 RyuJIT
| Method | Mean | Error | StdDev | Ratio | Code Size |
|---|---|---|---|---|---|
| 'No SIMD' | 25.816 ns | 0.0236 ns | 0.0209 ns | 1.00 | 290 B |
| 'Using SSE' | 8.831 ns | 0.0054 ns | 0.0048 ns | 0.34 | 355 B |
| 'Using SSE3' | 8.466 ns | 0.0065 ns | 0.0054 ns | 0.33 | 301 B |
| 'Using FMA' | 8.414 ns | 0.0138 ns | 0.0115 ns | 0.33 | 316 B |
| System.Numerics | 5.165 ns | 0.0042 ns | 0.0037 ns | 0.20 | 300 B |
If there is nothing heinously wrong with my benchmarking setup, that at least indicates that moving to using the Numerics structs would be best for performance, albeit only by a relatively small amount. The relative results are basically the same when just doing a single transform Nothing I have tried has come close to the Numerics results, and the best benchmark results I can get involve require using Numerics within the robust stucts themselves in order to convert to and from from Vector128Unsafe.AsPointer.
But then there is the other benchmar:k: `MatrixMultiplicationBenchmark, which is just a sequence of three matrix multiplications again using "normal" code, SIMD code, and System.Numerics.
Triple Matrix Multiplication Results
| Method | Mean | Error | StdDev | Ratio | Code Size |
|---|---|---|---|---|---|
| 'No Simd' | 13.29 ns | 0.013 ns | 0.010 ns | 1.00 | 730 B |
| 'Using Fma' | 10.95 ns | 0.007 ns | 0.006 ns | 0.82 | 566 B |
| 'Using Sse' | 11.09 ns | 0.006 ns | 0.005 ns | 0.83 | 575 B |
| System.Numerics | 38.40 ns | 0.008 ns | 0.008 ns | 2.89 | 644 B |
| 'System.Numerics (operator)' | 31.50 ns | 0.022 ns | 0.020 ns | 2.37 | 449 B |
The SIMDified code only seems to do ~20% better, but the System.Numerics results are just garbage? This might have to do with the fact that AFAICT they have no in/out/readonly variants? If only doing a single transform, the results aren't quite as heinous, but still just worse than existing code?
Single Matrix Multiplication Results
| Method | Mean | Error | StdDev | Ratio | Code Size |
|---|---|---|---|---|---|
| 'No Simd' | 7.582 ns | 0.0237 ns | 0.0198 ns | 1.00 | 266 B |
| 'Using Fma' | 5.397 ns | 0.0052 ns | 0.0046 ns | 0.71 | 205 B |
| 'Using Sse' | 5.510 ns | 0.0035 ns | 0.0032 ns | 0.73 | 208 B |
| System.Numerics | 9.796 ns | 0.0310 ns | 0.0275 ns | 1.29 | 361 B |
| 'System.Numerics (operator)' | 8.593 ns | 0.0044 ns | 0.0039 ns | 1.13 | 314 B |
So uhhh:
- Someone please check my benchmark code isn't hot garbage for some reason
- Are the results consistent across different CPUs (cpu info included in first set of results)?
- Given matrix mult appears to be trash while vector transform is only slightly slower, Do we just keep using Robust math + SIMD for the sake of convenience?
- This would still involve refactoring Robust Matirx3 -> Robust Matrix3x2.
Is this still alive?
Is this still alive?
The actual issue this is related to still hasn't been addressed: should we switch to using System.Numerics for vectors/matrices or keep using Robust.Math.
If people want to keep Robust.Math around, it should probably use some SIMD, so this PR is still relevant/useful. If not, the PR should be closed and a separate PR to move over to System.Numerics should be opened.
I guess this should become a maintainer meeting topic, but I already know I probably won't be around for the one happening tomorrow.