ZstdNet icon indicating copy to clipboard operation
ZstdNet copied to clipboard

Shared Compressor fails in multithreaded code

Open LexaGV opened this issue 1 year ago • 6 comments

I compiled sources of ZstdNet for FW 4.8. Plain usage:

var compressor = new Compressor();
data = File.ReadAllBytes(f.Filename);
ComprData = compressor.Wrap(data);

...and immediately fails:

System.AccessViolationException HResult=0x80004003 Message=Attempted to read or write protected memory. This is often an indication that other memory is corrupt. Source=<Cannot evaluate the exception source> StackTrace: ZStdLibFW.dll!ZstdNet.Compressor.Wrap(System.ReadOnlySpan src, System.Span dst) Line 90 C# ZStdLibFW.dll!ZstdNet.Compressor.Wrap(System.ReadOnlySpan src) Line 56 C# ZStdLibFW.dll!ZstdNet.Compressor.Wrap(byte[] src) Line 43 C#

It happens in Compressor.cs[90]:

var dstSize = Options.AdvancedParams != null
			    ? ExternMethods.ZSTD_compress2(cctx, dst, (size_t)dst.Length, src, (size_t)src.Length)
			    : Options.Cdict == IntPtr.Zero
				    ? ExternMethods.ZSTD_compressCCtx(cctx, dst, (size_t)dst.Length, src, (size_t)src.Length, Options.CompressionLevel)
				    : ExternMethods.ZSTD_compress_usingCDict(cctx, dst, (size_t)dst.Length, src, (size_t)src.Length, Options.Cdict);

Any idea why memory management fails?

LexaGV avatar Apr 12 '24 21:04 LexaGV

@LexaGV Could you please provide a more complete minimal example that reproduces the problem? Is the problem still reproducible with the version from the NuGet? Is the problem still reproducible with another file (e.g. empty)

Also the following information may be useful:

  • Sources commit revision
  • What changes are made
  • Build Target
  • Debug/Release
  • x86/x64
  • Libzstd version

In general, the AccessViolationException may indicate that the Compressor/CompressionOptions was disposed before use (but that doesn't seem to be the case for the snippet above)

dscheg avatar Apr 21 '24 14:04 dscheg

Nope, it's not nuget - I compiled my own .NET Framework 4.8 version. Seems I found what's a problem. Full example:

var compressor = new Compressor();
Parallel.ForEach(Disk.EnumFiles(@"C:\DEV\", ".vs", ".git", ".hg", "bin", "obj", "bin_Debug", "bin_Release"), ff => {
    if (!ff.EndsWith(".cs")) return;
    var data = File.ReadAllBytes(ff);
    var comprData = compressor.Wrap(data);
    Log.Debug("Compr len=" + comprData.Length);
});

THIS example fails w memory problem described above. But if you put var compressor = new Compressor(); before var comprData = compressor.Wrap(data);, everything works smooth. What makes me thinking Compressor cannot be shared across threads and it's not re-entrant. I know, it's not a huge problem - to create compressor every time you need, but I thought on HUGE list of files shared Compressor could improve performance.

LexaGV avatar Apr 22 '24 09:04 LexaGV

Instances of the Compressor class are not thread safe (as stated in the readme). If you want to optimize bulk processing performance you can use for example

  • Single Compressor instance with lock
  • ThreadStatic or ThreadLocal<T> instances
  • ConcurrentBag of instances with SemaphoreSlim
  • Object pooling
  • etc

Here is an example with ObjectPool<T>

// Read the docs: https://learn.microsoft.com/en-us/aspnet/core/performance/objectpool?view=aspnetcore-8.0
int parallelism = 4;
var pool = new DefaultObjectPool<Compressor>(new DefaultPooledObjectPolicy<Compressor>(), parallelism);
Parallel.ForEach(Directory.GetFiles(AppDomain.CurrentDomain.BaseDirectory, "*.cs"), new ParallelOptions {MaxDegreeOfParallelism = parallelism}, file =>
{
    var data = File.ReadAllBytes(file);
    var compressor = pool.Get();
    try
    {
        var compressed = compressor.Wrap(data);
        Console.WriteLine(compressed.Length);
    }
    finally
    {
        pool.Return(compressor);
    }
});

dscheg avatar Apr 22 '24 17:04 dscheg

Dmitry, thanks for such descriptive and helpful answer! The one thing I don't like is "dancing around" the class. Is it possible to make Compressor itself as a reentrant? Not sure what "status" you need to keep once you compress data. All necessary variables can be declared locally, so Compressor will be static class and all you need is just call "SqueezeMyData()"! :)

LexaGV avatar Apr 23 '24 15:04 LexaGV

Adding another simplified API is a possible option. But it will not be as efficient because it will require create/free an unmanaged context on each wrap. Instances of the Compressor class allow to reuse an unmanaged context. Check also native zstd docs

When compressing many times, it is recommended to allocate a context just once, and reuse it for each successive compression operation. This will make workload friendlier for system's memory.

Note : re-using context is just a speed / resource optimization, it doesn't change the compression ratio, which remains identical.

Note 2 : In multi-threaded environments, use one different context per thread for parallel execution.

dscheg avatar Apr 24 '24 05:04 dscheg

Dmitriy, thanks for explanation! I'm glad to chat with such qualified engineer with so deep knowledge. Then your solution is ideal, what is rare for FOSS. :) For my peace of mind I made a few tests - compress *.cs sources of FW. Sources were on the RAM drive (PF=Parallel.Foreach):

Default compression:
Pool+PF: Spent=00:00:00.627 on 18296 files, Avg ratio = 0.1898
PF:      Spent=00:00:01.425 on 18296 files, Avg ratio = 0.1898
foreach: Spent=00:00:02.755 on 18296 files, Avg ratio = 0.1898

Max compression:
Pool+PF: Spent=00:00:30.078 on 18296 files, Avg ratio = 0.159
PF:      Spent=00:00:30.275 on 18295 files, Avg ratio = 0.159
foreach: Spent=00:02:12.764 on 18296 files, Avg ratio = 0.159

(Looks like pool gives nothing when each thread works long enough time. And max compression almost identical to default one, but takes much more resources!)

Perfect result! Issue can be closed. And I highly recommend to include Pool sample in README - it's a huge help for novices! Thanks for all your work and time!

LexaGV avatar Apr 24 '24 13:04 LexaGV