goavro
goavro copied to clipboard
OCFReader should be concurrent
For fast processing code the decompression is the bottleneck. The avro format compresses blocks, right? So it should be possible to run concurrent decompressions such that more CPU cores can be used for decompression?
Uncompressing is by far the bottleneck in my pipeline, because of this single threadedness.
I hacked together some ugly code to do this, and on my 12 thread (6 core) laptop it increases speed by about 3x.
Still, not super fast.