scio icon indicating copy to clipboard operation
scio copied to clipboard

Add support for Zstd coders

Open kellen opened this issue 3 months ago • 1 comments

Adds

  • saveAsZstdDictionary to train a Zstd dictionary on some arbitrary SCollection[T]. Estimates the average size of elements T, collects n elements based on a target training set size, then trains and saves the Zstd dictionary.
  • A scala ZstdCoder object with transform Coders for the simple T or for each side of a (K, V)
  • command line argument to map from a type to a dictionary, causing instances of MyClass to get Zstd compression automagically. Probably fails if the type is parameterized. --zstdDictionary=com.spotify.scio.MyClass:gs://bucket/path/dict.bin

kellen avatar Apr 02 '24 16:04 kellen