Make C# Levenshtein faster
I have:
- [x] Read the project README, including the benchmark descriptions
Description of changes
- Use swapping instead of a temp variable to replace array values.
- Use regular arrays because they're already initialized with 0.
- Only check the length of strings to remove extra branching cause by string.IsNullOrEmpty method
Have you actually tested that this is faster? Reading your changes I'm pretty sure a lot of them is actually making the code slower. By creating arrays using "new" you are now causing heap allocations which are slower than stack allocated memory and require garbage collection. They are only "already initialized to 0" from your point-of-view but under the hood the CPU still has to initialize them to 0.
- Comparing strings with "==" is not fast at all, it first checks if either string is null before calling string.Equals.
- The "IsNullOrEmpty" was introduced by your first commit, the original code was ".IsEmpty" which was almost certainly faster, since it doesn't check for null. I'm not sure " == 0" is any different from the original ".IsEmpty".
- Changing from "int" to "var" shouldn't make any difference to performance at all.
Probably largest time taken by the C# is just startup time when the IL gets compiled to machine code. If code was compiled with NativeAOT it should run much faster.
Unfortunately if we ignore the startup-time though runtime is usually a bit worse with NativeAOT since the JIT doesn't run and can't apply further optimizations to the code as it runs. So if the code ran for long enough where startup-time becomes a small factor NativeAOT could loose out again.
NativeAOT does seem to loose out a tiny bit with the run times of the new input data. It just barely runs the test faster than the non-AOT version, even though it starts much faster:
- https://pez.github.io/languages-visualizations/#levenshtein
- https://pez.github.io/languages-visualizations/#hello-world
Running on my M2 Pro (Mac14,9), the new implementation is much faster.
==only calls string.Equals, string.Equals checks null- IsEmpty is an extension method of Span, that can't be used with regular strings
- int was replaced with var by my IDE automatically
This implementation removed Span, which sadly increased the memory usage of the code, but made it run faster. Since this benchmark only cares the speed and not the memory usage, i think this solution is better than the original.
Please check on your own site if it actually got faster. Thanks.
Test the original code using net 8.0 it'll be faster. There is problem in net 9.0. if you replace https://github.com/bddicken/languages/blob/655d7e7fe403f3c0e52102838a2294c227e9f2c0/levenshtein/csharp/code.cs#L27 with
ReadOnlySpan<byte> bytes1 = MemoryMarshal.Cast<char, byte>(str1);
ReadOnlySpan<byte> bytes2 = MemoryMarshal.Cast<char, byte>(str2);
if (bytes1.SequenceEqual(bytes2))
they are the same speed. Not sure why
Would be awesome if someone wanted to help getting C# over to the new benchmark runner:
- https://github.com/bddicken/languages/issues/371