goavro icon indicating copy to clipboard operation
goavro copied to clipboard

OCFReader should be concurrent

Open ThomasHabets opened this issue 3 years ago • 1 comments

For fast processing code the decompression is the bottleneck. The avro format compresses blocks, right? So it should be possible to run concurrent decompressions such that more CPU cores can be used for decompression?

Uncompressing is by far the bottleneck in my pipeline, because of this single threadedness.

ThomasHabets avatar May 28 '21 17:05 ThomasHabets

I hacked together some ugly code to do this, and on my 12 thread (6 core) laptop it increases speed by about 3x.

Still, not super fast.

ThomasHabets avatar May 29 '21 14:05 ThomasHabets