add parallelism to add vectors

Open mccanne opened this issue 3 years ago • 0 comments

We should look into increasing the throughput of the add vector operation by adding parallelism. Currently, it is run synchronously for each object in the list to add.

Since vectorization requires potentially a large memory footprint per object, we should be careful about how we go about this. It may be that the important bottleneck is the CPU and we just want to do a one-back read of the next file from the storage system while the CPU is processing the current file and add parallelism to the ZST encode path, in particular the vector-compression logic.

Jul 20 '22 15:07 mccanne