Zstd compression with dictionnary based on schema
Prerequisites
- [x] I have written a descriptive issue title
- [x] I have searched existing issues to ensure the feature has not already been requested
🚀 Feature Proposal
Hello !
I was just quickly thinking, would it be possible to create a zstd dictionnary based on document schema to make it wayyyyy faster to compress/decompress ?
Motivation
Increase performances
Example
No response
I took a look and, while this is an interesting idea, I don't think Mongoose can support this right now because the MongoDB Node driver uses @mongodb-js/zstd custom zstd implementation, which doesn't support dictionary compression. Current API is just compress(data, compressionLevel) and decompress(data), no dictionary support. Do you have any ideas to work around this @billouboq ?
Drivers have considered dictionary support in the past but decided not to implement this feature (https://jira.mongodb.org/browse/DRIVERS-2396). This change would require server changes to support the dictionary used for compression server-side (the server + client must share the same dictionary used for compression), and that breaks the stateless behavior of existing client + server compression.
Also, open to suggestions about what it might look like to create a dictionary based on a schema, but all the underlying zstd APIs to create dictionaries train the dictionary from sample documents. I'm not sure what that would look like in Mongoose - would example documents be generated from the schema, serialized to bytes and then fed into the trainer? Or something else?
I'm going to close this issue for now since there isn't a way for Mongoose to reasonably implement zstd dictionary support without significant changes from the MongoDB Node driver.