kartothek icon indicating copy to clipboard operation
kartothek copied to clipboard

Investigate if we can replace BlockBuffer with pyarrow buffer

Open lr4d opened this issue 4 years ago • 0 comments

Problem description

As commented by @NeroCorleone in #397:

We have seen weird IOErrors on long running ktk/dask computations that have caused incidents

These errors happen while reading Parquet files from Azure blob.

One possible cause for this is kartothek's custom implementation of BlockBuffer. While it was useful at the time of implementation, we can look into replacing this with the pyarrow buffer, so that we don't need to maintain this complex piece of code and can discard this a source for the problem.

We'll want to check potential performance implications of this change.

Implementation hint: fa2af5c (#397)

lr4d avatar Feb 04 '21 10:02 lr4d