NAudio icon indicating copy to clipboard operation
NAudio copied to clipboard

CPU intrinsics

Open macaba opened this issue 5 years ago • 8 comments

This pull request is here to generate some discussion about integrating the CPU intrinsics available in .NET Core 3.0 into NAudio by providing a code example of how it might be integrated into a class.

I benchmarked this pull request, and as expected, VolumeSampleProvider runs significantly faster:

image

Is this compatible with your vision for NAudio?

Would you be interested in a larger pull request that uses intrinsics in a wide variety of places?

macaba avatar Oct 29 '19 18:10 macaba

I've set up a proper benchmarking project in my repo to allow for impartial comparison.

Benchmark for VolumeSampleProvider:

image

I'm initially focusing on the classes that I use personally so my next class to optimise is the BiQuadFilter.

macaba avatar Nov 02 '19 13:11 macaba

Haven't had a chance to look at this in detail yet (unfortunately, other projects taking priority at the moment), but just wanted to jump in and say I like the idea of providing optimised versions of some of these classes, and looks like you've achieved a really impressive speedup there.

One thing that I have been experimenting with is whether an NAudio 2.0 should fully embrace Span<T>. I made a quick proof of concept a while ago in this repo. With this approach ISampleProvider would look like this

If I do get time to create that sort of NAudio 2.0, then making use of new features like CPU intrinsics would definitely be of use. There are also some benefits of making use of the new compiler intrinsics to access the Calli instruction which would help resolve some threading headaches with the COM interop

For the existing NAudio 1.9, I'm happy to take PRs for any performance changes that don't make breaking changes to the public interface, and have been adequately tested.

markheath avatar Nov 02 '19 14:11 markheath

I would say the use of Span in the way you suggest for NAudio 2.0 is ideal. The existing offset and sampleCount simply become:

volumeSampleProvider.Read(buffer.Slice(offset, sampleCount));

macaba avatar Nov 03 '19 10:11 macaba

This is still a very interesting prospect, I'd like to see these tests done in .NET 5 to see if there's even more of a performance bump.

Are we going to see something like this in the next version of NAudio?

Zintom avatar Jun 12 '21 17:06 Zintom

.NET 5 doesn't provide any additional gain in the case where CPU intrinsics are used. (understandably - CPU instrinsics is the absolute fastest way to do this, with a 10x speed improvement)

image

In the case of no CPU instrinsics being used, the act of simply moving to .NET 5 from a .NET Core runtime does not yield any improvement.

image

macaba avatar Jun 12 '21 22:06 macaba

Still, CPU intrinsics appear to be a clear win even if there's no difference between 3.1/5.0

Zintom avatar Jun 13 '21 17:06 Zintom

Yep, another example of CPU intrinsic speed up is in #794 if you’re curious.

macaba avatar Jun 13 '21 21:06 macaba

This is very nice and should probably be merged @markheath

jwosty avatar Aug 27 '21 21:08 jwosty