NAudio
NAudio copied to clipboard
CPU intrinsics
This pull request is here to generate some discussion about integrating the CPU intrinsics available in .NET Core 3.0 into NAudio by providing a code example of how it might be integrated into a class.
I benchmarked this pull request, and as expected, VolumeSampleProvider runs significantly faster:
Is this compatible with your vision for NAudio?
Would you be interested in a larger pull request that uses intrinsics in a wide variety of places?
I've set up a proper benchmarking project in my repo to allow for impartial comparison.
Benchmark for VolumeSampleProvider:
I'm initially focusing on the classes that I use personally so my next class to optimise is the BiQuadFilter.
Haven't had a chance to look at this in detail yet (unfortunately, other projects taking priority at the moment), but just wanted to jump in and say I like the idea of providing optimised versions of some of these classes, and looks like you've achieved a really impressive speedup there.
One thing that I have been experimenting with is whether an NAudio 2.0 should fully embrace Span<T>. I made a quick proof of concept a while ago in this repo. With this approach ISampleProvider would look like this
If I do get time to create that sort of NAudio 2.0, then making use of new features like CPU intrinsics would definitely be of use. There are also some benefits of making use of the new compiler intrinsics to access the Calli instruction which would help resolve some threading headaches with the COM interop
For the existing NAudio 1.9, I'm happy to take PRs for any performance changes that don't make breaking changes to the public interface, and have been adequately tested.
I would say the use of Span in the way you suggest for NAudio 2.0 is ideal. The existing offset and sampleCount simply become:
volumeSampleProvider.Read(buffer.Slice(offset, sampleCount));
This is still a very interesting prospect, I'd like to see these tests done in .NET 5 to see if there's even more of a performance bump.
Are we going to see something like this in the next version of NAudio?
.NET 5 doesn't provide any additional gain in the case where CPU intrinsics are used. (understandably - CPU instrinsics is the absolute fastest way to do this, with a 10x speed improvement)
In the case of no CPU instrinsics being used, the act of simply moving to .NET 5 from a .NET Core runtime does not yield any improvement.
Still, CPU intrinsics appear to be a clear win even if there's no difference between 3.1/5.0
Yep, another example of CPU intrinsic speed up is in #794 if you’re curious.
This is very nice and should probably be merged @markheath