Varying compression level & codec
A community user Mark Gardner from Virginia Tech submitted the following feature request relevant to his use case. He gave permission for us to quote him verbatim and attribute the inquiry to him.
I really need to be able to change the compression level. We never have enough storage (and it doesn't look like we will ever have enough storage as I fill it up as soon as it is available) so compressing it heavily on ingestion is a high priority for us. From my perusal of the code, the compression level is hard coded at a fairly low value. We need that knob to be exposed so I can set it higher.
I really want/need zstd support for the same reasons. Zstd + compression level would allow me to perform tradeoffs on the compression speed vs size continuum much better than any other codec I've worked with. Further, Zstd is asymmetric; compression may be slow but decompression fast (and only mildly proportional to the compression level). This would allow us to ingest and heavily compress into the data lake while not sacrificing query speed by much if any. Under the theory that queries will be more frequent than ingestion, being able to use zstd with a high compression level would be a huge win.
I started to look at adding both features to the code but realized that it would require some refactoring before being able to add them elegantly. We are under enough pressure that it was deemed more advisable to compress the Zeek JSON logs as is (using zstd -14) than spend time adding those features to the code. It has been a while since I looked at it but one of the directions (compress or decompress) looked fairly straight forward to do but the other would require refactoring to modularize the code before being able to add a new codec.
In my opinion, the possibility of compression is a very useful feature. Currently, when importing a 2GB csv file, the imported file is about 1GB or more. If the compression level is increased, the file size will be much smaller.
But maybe the search time and CPU pressure will increase. But less memory is consumed
If it is possible for the user to choose whether he needs more processing speed, lower the compression level, but if storage space is the user's priority, increase the compression level.
Thank you for your efforts to develop this tool