Crc32.NET icon indicating copy to clipboard operation
Crc32.NET copied to clipboard

Support Crc32C Hardware Intrinsics on .NET Core 3

Open nitinag opened this issue 4 years ago • 17 comments

It looks like .NET Core 3 supports hardware intrinsics and has support for CRC32C: https://github.com/dotnet/designs/blob/master/accepted/platform-intrinsics.md

_mm_crc32_u64 https://github.com/dotnet/coreclr/blob/master/src/System.Private.CoreLib/shared/System/Runtime/Intrinsics/X86/Sse42.cs

Would be great if the library could use this eventually when the instructions are present on the right platform.

nitinag avatar Jul 31 '19 18:07 nitinag

Yep, I'll try to implement this ability. It is good idea, thank you.

force-net avatar Aug 01 '19 06:08 force-net

Support hardware accelerated in .NET Core 3.0 is declared in this realization, but I have not tested https://github.com/differentrain/Crc32cSharp

Agagamand avatar Aug 17 '19 11:08 Agagamand

I've got this working for CRC32C and ready to put in a PR as soon as #19 gets merged. In my testing, I'm seeing about a 6x performance improvement for 64-bit processes on a modern Intel processor (this is above an beyond the perf improvements in PR #19). About 6 microseconds to compute on a 64KB buffer.

Method Runtime Mean Error StdDev Ratio Rank
Default .NET Core 2.1 39.789 us 0.6658 us 0.9333 us 1.00 2
Default .NET Core 3.1 6.017 us 0.0502 us 0.0469 us 0.15 1

brantburnett avatar Oct 19 '20 13:10 brantburnett

@brantburnett Do you also have an implementation for CRC32?

Skyppid avatar Nov 10 '20 09:11 Skyppid

@Skyppid Unfortunately, the Intel intrinsic operation is specific to CRC32C, based on the polynomial 0x11EDC6F41

https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=crc&expand=1288

There may be other ways to use intrinsics to optimize CRC32, but if so I haven't found them yet.

brantburnett avatar Nov 10 '20 13:11 brantburnett

@Skyppid Yep, only CRC32C can be hardware accelerated.

force-net avatar Nov 10 '20 13:11 force-net

Ah, okay didn't know. Well I saw that CRC32C is the recommended one anyway, so I switched to that.

Skyppid avatar Nov 10 '20 14:11 Skyppid

I did some digging, and I was able to find and implement a CRC32 algorithm using carryless multiplication intrinsics. It's not as big a difference as CRC32C, only about a 25% reduction in runtime instead of 85%, but it's still something. I may get it implemented more completely once the other stuff is merged.

brantburnett avatar Nov 10 '20 16:11 brantburnett

I'm using the CRC32 algorithm right now and it definitely takes a while for my use case (63,000 files, 25GB). I'll be switching to (software) CRC32C to see how much faster that is. I'm fascinated to see if I'll be able to switch to this hardware accelerated version and how much faster that will be.

I'm using .NET 5, so I presume this will apply the same as .NET Core 3.

I just got a new PC with a Core i9 chip, so I'm presuming that will have the Intel CRC32C hardware acceleration. If I wanted to run the same code on older PCs, is there any way to find out what chips support this hardware acceleration? Would a 3 year old i5 work? What about something like an Atom or Celeron?

benwmills avatar Feb 16 '21 16:02 benwmills

@benwmills

Yes, it will apply the same to .NET 5 as Core 3.1. As to processor support, the instructions are included in SSE4.2. This was first introduced on i7 chips starting around 2008.

I don't have hard data, but seems like most Intel chips in the last few years support it. This is a list of processors: https://ark.intel.com/content/www/us/en/ark/search/featurefilter.html?productType=873&1_Filter-InstructionSetExtensions=3540

brantburnett avatar Feb 16 '21 16:02 brantburnett

Looks like AMD chips also support SSE4.2. Does that mean that they support hardware accelerated CRC32C too? Everything I've read just talks about Intel chips.

Sorry for the naive question, but what happens when you calculate the CRC32C of a file on a network share? Does the file have to be copied to the local machine to calculate the CRC? I don't imagine this is possible, but any chance the chip on the machine hosting the network share (e.g. a Synology) can calculate the CRC?

benwmills avatar Feb 16 '21 16:02 benwmills

I believe that AMD chips which have SSE4.2 will also automatically get the performance improvement, yes. But I'm not an expert.

To my knowledge, to calculate the CRC over a network share you'd either need to run a service on that server and use HTTP or something similar to request the CRC or stream the whole file to your machine to calculate yourself. The only exception would be if there is some built-in support in the SMB protocol Windows uses for file shares. I have no clue on that front, though.

brantburnett avatar Feb 16 '21 17:02 brantburnett

I switched from CRC32 to CRC32C and the speed is the same. For me, about 58,000 milliseconds for 63,000 files.

I'm really interested to see what I'll get with hardware acceleration when this is ready. 6 fold increase in speed would be amazing.

benwmills avatar Feb 17 '21 14:02 benwmills

@benwmills It seems, file reading is slower than CRC32 calculation. Also, it is not good idea to use CRC32 for checking file integrity. SHA1 or something is better. CRC checksums are good for relatively small block of data, primarily for network transfer.

force-net avatar Feb 17 '21 19:02 force-net

I'm writing some code to sync a folder to another folder. I can't always rely on file size and modification date, so I wanted to compare via some kind of file hash.

I've previously used software (ViceVersa Pro) that uses CRC in these cases, but I'm also aware of other hashes like MD5 or SHA1. This is fairly new to me, so I don't know the pros and cons of CRC vs MD5 vs SHA1. I'm obviously looking for reliability, but speed is huge too. I took the lead from Vice Versa Pro to use CRC and it's working well. To be able to calculate the hash on 63,000 files in less than a minute is pretty impressive to me, but maybe there are better options.

Sorry, I know this is getting off topic in this thread. The software CRC is working great for me. Just really intrigued by the possibility of much faster hardware CRC.

benwmills avatar Feb 17 '21 20:02 benwmills

Yeah, it off topic, but it ok to discuss it here :)

As I understood, you check files for difference using comparing it hashes. In normal situation it is ok, because you also check size and time. But for specially prepared files collisions can exist which will lead to 'false file equality'. For example, hacker changes file content and changes last bytes to make required hash. But in reality, lot of file syncing utilities do not look at content. Just date, size and name. And it works good.

Also, with file syncing operations - network speed and latency can be important. You can look at rsync algorithm which uses two hashes fast and linear and slow and accurate. It allows to find insertions and deletions in files and sync only small part of data.

force-net avatar Feb 17 '21 22:02 force-net

Yep, only CRC32C can be hardware accelerated.

It is probably the case for x86/SSE4.2. But it looks like Arm does support both CRC32 and CRC32C (since ARMv8.1) as does System.Runtime.Intrinsics.Arm (since .NET 5): CRC32 and CRC32C.

EduardSergeev avatar Mar 29 '22 05:03 EduardSergeev