tensorstore icon indicating copy to clipboard operation
tensorstore copied to clipboard

optimize performance: recommended blockSize (N5) / chunks (zarr) ?

Open FelixSchwarz opened this issue 2 years ago • 1 comments

I'm trying to store some data using TensorStore but I'm struggling a bit with achieving acceptable performance. Unfortunately I did not find any performance optimization hints in the documentation.

Let's say I have a dataset with 5000 entries each consisting of a float32 tensor like [100, 1500, 2000] so the final dimensions are [5000, 100, 1500, 2000].

I'll read/write always a single entry ([100, 1500, 2000]). I noticed that completely omitting the "blockSize" setting reduces performance quite a lot, blockSize: [1, 100, 150, 2000] is much better but still way slower than numpy.save (roughly doubling the required time). Is there a way to increase performace to get to a similar level as with plain numpy?

FelixSchwarz avatar Jan 13 '23 16:01 FelixSchwarz

Have you considered using a blockSize of [1, 100, 1500, 2000]?

Also, if you can provide some simple but complete example code that demonstrates the performance discrepancy between tensorstore and numpy.save that would be very helpful in getting to the bottom of it, and figuring out which, if any changes, are needed in tensorstore to improve the performance for this use case.

jbms avatar Jan 13 '23 18:01 jbms

Sorry for not following up. I'm not working on the project anymore, so I can't provide more details.

FelixSchwarz avatar Apr 18 '24 06:04 FelixSchwarz