zstd icon indicating copy to clipboard operation
zstd copied to clipboard

Fast Node js version that lives up to Benchmark

Open Awendel opened this issue 1 year ago • 6 comments

By far the strongest use case for using zstd is for the use in databases, such as Redis with repeatable small objects, where one can pretrain a dictionary on the data. A trained ZSTD dictionary will outperform snappy by a magnitude of 3-4.

Yet there doesn't seem to be a decent Node.js version that lives up to the benchmarks posted in the repository.

Officially zstd is meant to outperform Snappy at decompression, yet the only working and maintained Node js version I found: https://github.com/yoshihitoh/zstd-codec

is sadly about 3x slower than Snappy, especially with decompression, where it counts.

Would be great if someone could point me into the direction if there is indeed already an existing Node js version that comes closer to the official benchmark values, or consider implementing one.

Ideally with the following requirements: • only for Node js, not browser environment (so it can use Buffers, C++ addons etc.) • performance comparable to original implementation • can pass on dictionaries just as Buffers • uses latest version of zstd and not an outdated one • can be used in Node for type module (new ES6 module syntax)

Since Node js is one of the most used Server environments, it would greatly help advancing the zstd project.

For snappy in Node js, there is for example the excellent: https://github.com/Brooooooklyn/snappy

A node js version of zstd with the same quality standard & performance would be ideal.

EDIT: The node js versions that are linked to from the zstd website should be removed, since they haven't been maintained in 6-7 years and have huge memory leaking issues / are unstable.

Awendel avatar Jul 09 '22 17:07 Awendel

The closest node.js binding I can think of corresponding to this request would be @Stieneee 's simple-zstd : https://www.npmjs.com/package/simple-zstd

featuring a great blog entry explaining the genesis and outcome: https://tylerstiene.ca/blog/zstandard-compression-for-nodejs/ and even a benchmark script for quick evaluation : https://github.com/Stieneee/node-compression-test

I suspect it doesn't support dictionary compression yet, but it might be expandable, given that it's designed to link to system's dynamic library where Dictionary compression API has been stable for a while now.

_edit : list of javascript / node.js ports and bindings has been updated on Zstandard's homepage.

Cyan4973 avatar Jul 09 '22 23:07 Cyan4973

Thanks for the prompt reply!

I came across simple-zstd before, but I think there may be some fundamental problems with it:

  • a confusing API compared to other libraries where encode / decode on buffer like data is a lot more straight forward
  • no dictionary support (apparently)
  • the overhead involved with creating / destroying a child process for each conversion

in total, it seems that simple-zstd was more designed for compressing / decompressing (large) files.

I didn't run my own benchmark yet, but I suppose it wouldn't perform so well with lots of small data in the 1KB range, because of the child process overhead. There would have to be some sort of pool of processes that get reused or something.

I still believe the most compelling use case for zstd where it can really shine againt other compression algorithms is for compressing database entries, for example json like entries in Redis, MongoDB etc, especially in conjunction with dictionary support that can be trained on the datastructures. While there is some options like zstd-codec that work very well, sadly there performance is not in the same ballpark yet as Snappy for Node.js - which is also an important point in usage with databases, since latency is crucial.

Awendel avatar Jul 10 '22 09:07 Awendel

You are correct on all points, when I built the library I was just addressing my use case for large files and the stream interface was the logical interface.

I would be concerned about the child process overhead for your use case as well. A pool of pre-spawned instances might be easy enough to implement. In addition, dictionary support could be added and the encode decode support would just be an extension of the current implementation. If you are willing to test/benchmark I would be happy to take a swing at adding those features to the library.

Stieneee avatar Jul 10 '22 20:07 Stieneee

Sounds like a great idea and I'm definitely ready to test and benchmark.

The theoretical structure of the project and all is extremely promising once properly implemented, it could become the defacto Node js standard for zstd, also since it is not tied to a specific release, so very future proof.

In terms of the dictionary API, one could make it class / constructor based and essentially passing on the dictionary buffer at construction and perhaps the amount of internal child processes for the pool.

Similar to for how for the C Api it is recommended to load the buffer once and then reuse the loaded up dictionary instead of reparsing it on every compress invocation, which is computationally expensive.

Awendel avatar Jul 10 '22 22:07 Awendel

So far this discussion has only touched on two vastly different approaches, each with their own drawbacks:

  1. Compile zstd to webassembly. Necessary for browser support, but makes it awkward to use in NodeJS and as you found out the performance is not stellar.
  2. Call out to a child process. You already mentioned some of the drawbacks, but another drawback is simply that it relies on a system installed zstd binary. This means you can't easily support multiple different versions of zstd, and requires an explicit installation step rather than relying on node's dependency management. This is probably workable for applications, but as a library author that would like to use zstd, it's not great.
  3. This wasn't mentioned yet, but you could also attempt to port zstd to JS, but that comes with a huge maintenance burden and you'll likely have the same or worse performance as the webassembly approach.

I'm surprised no one has discussed native bindings yet. The bindings I've found available today are not in a great shape and need to be recompiled for new versions of node because they are using NAN. Nowadays we have Node-API and node-addon-api that make it a lot easier, but most importantly offer a stable ABI so that you don't need to recompile for every node version.

To be able to use zstd from Node specifically (not browsers), writing a set of native bindings using node-addon-api seems like the best choice.

Nevon avatar Jul 27 '22 09:07 Nevon

That's some very good points and promising suggestion!

Indeed, if one wanted to create a larger npm module that makes use of zstd as a dependency, your suggestion really does seem like the only right way moving forward.

Awendel avatar Jul 27 '22 10:07 Awendel

@Awendel did you have any luck with this search?

I initially found https://github.com/101arrowz/fzstd when googling, but found it too slow for my case: It was taking ~200ms to decompress my data (20mb), which is not acceptable on a browser. So I moved it to the server-side, which takes ~700ms. I presumed it was slow because it was written in JavaScript, to support browsers.

Went looking for alternatives, to see if there was any lib using native code, but they are even slower or don't work at all:

  • @bokuweb/zstd-wasm
  • @hpcc-js/wasm
  • @oneidentity/zstd-js
  • @skhaz/zstd

fzstd seems to be the best option for me, even though it's not that fast.

ianldgs avatar Nov 17 '23 12:11 ianldgs

@ianldgs I gave up on finding a reliable node zstd implementation. Ended up using node's native brotly compression at the lowest compression level, which is very fast and compresses at decent size and behaves stable enough for production.

Awendel avatar Nov 21 '23 23:11 Awendel

@Awendel did you have any luck with this search?

I initially found https://github.com/101arrowz/fzstd when googling, but found it too slow for my case: It was taking ~200ms to decompress my data (20mb), which is not acceptable on a browser. So I moved it to the server-side, which takes ~700ms. I presumed it was slow because it was written in JavaScript, to support browsers.

Went looking for alternatives, to see if there was any lib using native code, but they are even slower or don't work at all:

  • @bokuweb/zstd-wasm
  • @hpcc-js/wasm
  • @oneidentity/zstd-js
  • @skhaz/zstd

fzstd seems to be the best option for me, even though it's not that fast.

I haven't tried benchmarking this, but you have https://github.com/mongodb-js/zstd. Maybe worth a try

Hazmi35 avatar Nov 22 '23 08:11 Hazmi35

@Hazmi35 thanks, I've tested it and did some basic benchmarks. It works and it's much faster than fzstd. At least 50x faster. The only drawback for me is that I have a nextjs project, so it needs a custom webpack loader for the .node files (https://www.npmjs.com/package/nextjs-node-loader).

ianldgs avatar Nov 22 '23 12:11 ianldgs

mongodb-js/zstd added to the list of javascript nodejs ports.

Cyan4973 avatar Feb 13 '24 21:02 Cyan4973