parquet-dotnet icon indicating copy to clipboard operation
parquet-dotnet copied to clipboard

DataColumn sliced buffers

Open msitt opened this issue 1 year ago • 2 comments

Issue description

I would like the ability to create a DataColumn using this signature: DataColumn(..., Array data, int startIndex, int length, ...) where the 2 params startIndex and length are added to indicate the slice of data to actually write.

This would allow the use of fixed length buffers and avoid unnecessary array allocations.

For instance, for the use case of writing row groups (column chunks) of size N, I would allocate arrays of length N. If the total number of rows is not a multiple of N, the last row group has size <N which means another array allocation.

Please advise if I am missing something obvious here.

msitt avatar Jul 08 '23 17:07 msitt

I'd be partial to a Span personally...

serialseb avatar Oct 09 '23 05:10 serialseb

It's complicated at the moment as it touches quite a few internal parts that needs to be carefully tested. Some work is undergoing so in one of the major releases you might see the DataColumn interface supporting something more modern, or replaced completely (one of the big issues is backward compatibility).

aloneguid avatar Nov 16 '23 09:11 aloneguid