kartothek
kartothek copied to clipboard
Investigate if we can replace BlockBuffer with pyarrow buffer
Problem description
As commented by @NeroCorleone in #397:
We have seen weird IOErrors on long running ktk/dask computations that have caused incidents
These errors happen while reading Parquet files from Azure blob.
One possible cause for this is kartothek's custom implementation of BlockBuffer. While it was useful at the time of implementation, we can look into replacing this with the pyarrow buffer, so that we don't need to maintain this complex piece of code and can discard this a source for the problem.
We'll want to check potential performance implications of this change.
Implementation hint: fa2af5c (#397)