parquet-dotnet
parquet-dotnet copied to clipboard
DataColumn sliced buffers
Issue description
I would like the ability to create a DataColumn using this signature:
DataColumn(..., Array data, int startIndex, int length, ...)
where the 2 params startIndex
and length
are added to indicate the slice of data
to actually write.
This would allow the use of fixed length buffers and avoid unnecessary array allocations.
For instance, for the use case of writing row groups (column chunks) of size N, I would allocate arrays of length N. If the total number of rows is not a multiple of N, the last row group has size <N which means another array allocation.
Please advise if I am missing something obvious here.
I'd be partial to a Span personally...
It's complicated at the moment as it touches quite a few internal parts that needs to be carefully tested. Some work is undergoing so in one of the major releases you might see the DataColumn
interface supporting something more modern, or replaced completely (one of the big issues is backward compatibility).