ZstdNet Shared Compressor fails in multithreaded code

I compiled sources of ZstdNet for FW 4.8. Plain usage:

var compressor = new Compressor();
data = File.ReadAllBytes(f.Filename);
ComprData = compressor.Wrap(data);

...and immediately fails:

System.AccessViolationException HResult=0x80004003 Message=Attempted to read or write protected memory. This is often an indication that other memory is corrupt. Source=<Cannot evaluate the exception source> StackTrace: ZStdLibFW.dll!ZstdNet.Compressor.Wrap(System.ReadOnlySpan src, System.Span dst) Line 90 C# ZStdLibFW.dll!ZstdNet.Compressor.Wrap(System.ReadOnlySpan src) Line 56 C# ZStdLibFW.dll!ZstdNet.Compressor.Wrap(byte[] src) Line 43 C#

It happens in Compressor.cs[90]:

var dstSize = Options.AdvancedParams != null
			    ? ExternMethods.ZSTD_compress2(cctx, dst, (size_t)dst.Length, src, (size_t)src.Length)
			    : Options.Cdict == IntPtr.Zero
				    ? ExternMethods.ZSTD_compressCCtx(cctx, dst, (size_t)dst.Length, src, (size_t)src.Length, Options.CompressionLevel)
				    : ExternMethods.ZSTD_compress_usingCDict(cctx, dst, (size_t)dst.Length, src, (size_t)src.Length, Options.Cdict);

Any idea why memory management fails?

Apr 12 '24 21:04 LexaGV

@LexaGV Could you please provide a more complete minimal example that reproduces the problem? Is the problem still reproducible with the version from the NuGet? Is the problem still reproducible with another file (e.g. empty)

Also the following information may be useful:

Sources commit revision
What changes are made
Build Target
Debug/Release
x86/x64
Libzstd version

In general, the AccessViolationException may indicate that the Compressor/CompressionOptions was disposed before use (but that doesn't seem to be the case for the snippet above)

Apr 21 '24 14:04 dscheg

Nope, it's not nuget - I compiled my own .NET Framework 4.8 version. Seems I found what's a problem. Full example:

var compressor = new Compressor();
Parallel.ForEach(Disk.EnumFiles(@"C:\DEV\", ".vs", ".git", ".hg", "bin", "obj", "bin_Debug", "bin_Release"), ff => {
    if (!ff.EndsWith(".cs")) return;
    var data = File.ReadAllBytes(ff);
    var comprData = compressor.Wrap(data);
    Log.Debug("Compr len=" + comprData.Length);
});

THIS example fails w memory problem described above. But if you put var compressor = new Compressor(); before var comprData = compressor.Wrap(data);, everything works smooth. What makes me thinking Compressor cannot be shared across threads and it's not re-entrant. I know, it's not a huge problem - to create compressor every time you need, but I thought on HUGE list of files shared Compressor could improve performance.

Apr 22 '24 09:04 LexaGV

Instances of the Compressor class are not thread safe (as stated in the readme). If you want to optimize bulk processing performance you can use for example

Single Compressor instance with lock
ThreadStatic or ThreadLocal<T> instances
ConcurrentBag of instances with SemaphoreSlim
Object pooling
etc

Here is an example with ObjectPool<T>

// Read the docs: https://learn.microsoft.com/en-us/aspnet/core/performance/objectpool?view=aspnetcore-8.0
int parallelism = 4;
var pool = new DefaultObjectPool<Compressor>(new DefaultPooledObjectPolicy<Compressor>(), parallelism);
Parallel.ForEach(Directory.GetFiles(AppDomain.CurrentDomain.BaseDirectory, "*.cs"), new ParallelOptions {MaxDegreeOfParallelism = parallelism}, file =>
{
    var data = File.ReadAllBytes(file);
    var compressor = pool.Get();
    try
    {
        var compressed = compressor.Wrap(data);
        Console.WriteLine(compressed.Length);
    }
    finally
    {
        pool.Return(compressor);
    }
});

Apr 22 '24 17:04 dscheg

Dmitry, thanks for such descriptive and helpful answer! The one thing I don't like is "dancing around" the class. Is it possible to make Compressor itself as a reentrant? Not sure what "status" you need to keep once you compress data. All necessary variables can be declared locally, so Compressor will be static class and all you need is just call "SqueezeMyData()"! :)

Apr 23 '24 15:04 LexaGV

Adding another simplified API is a possible option. But it will not be as efficient because it will require create/free an unmanaged context on each wrap. Instances of the Compressor class allow to reuse an unmanaged context. Check also native zstd docs

When compressing many times, it is recommended to allocate a context just once, and reuse it for each successive compression operation. This will make workload friendlier for system's memory.

Note : re-using context is just a speed / resource optimization, it doesn't change the compression ratio, which remains identical.

Note 2 : In multi-threaded environments, use one different context per thread for parallel execution.

Apr 24 '24 05:04 dscheg

Dmitriy, thanks for explanation! I'm glad to chat with such qualified engineer with so deep knowledge. Then your solution is ideal, what is rare for FOSS. :) For my peace of mind I made a few tests - compress *.cs sources of FW. Sources were on the RAM drive (PF=Parallel.Foreach):

Default compression:
Pool+PF: Spent=00:00:00.627 on 18296 files, Avg ratio = 0.1898
PF:      Spent=00:00:01.425 on 18296 files, Avg ratio = 0.1898
foreach: Spent=00:00:02.755 on 18296 files, Avg ratio = 0.1898

Max compression:
Pool+PF: Spent=00:00:30.078 on 18296 files, Avg ratio = 0.159
PF:      Spent=00:00:30.275 on 18295 files, Avg ratio = 0.159
foreach: Spent=00:02:12.764 on 18296 files, Avg ratio = 0.159

(Looks like pool gives nothing when each thread works long enough time. And max compression almost identical to default one, but takes much more resources!)

Perfect result! Issue can be closed. And I highly recommend to include Pool sample in README - it's a huge help for novices! Thanks for all your work and time!

Apr 24 '24 13:04 LexaGV

ZstdNet ZstdNet copied to clipboard

Shared Compressor fails in multithreaded code

ZstdNet
ZstdNet copied to clipboard