parquet-dotnet
parquet-dotnet copied to clipboard
[BUG]: If the data transferred to WriteColumnAsync is too large, an error occurs
Library version
last
OPERATING SYSTEMS
Linux, Windows
OS architecture
64 bit
How to reproduce?
- You are converting 1.5 million data from a database.
- A recording of the stream in the parquet opens.
- 1 group of lines is created (according to the condition, you need to write everything in 1)
- each column is written to WriteColumnAsync
- It gives an error:
OverflowException: The array size has exceeded the supported range. in Microsoft.IO.RecyclableMemoryStream.ToArray() in /_/src/RecyclableMemoryStream.cs:line 820 in Parquet.File.DataColumnWriter.CompressAndWriteAsync(PageHeader ph, MemoryStream data, ColumnSizes cs, CancellationToken cancelToken) in Parquet.File.DataColumnWriter. WriteColumnAsync(ColumnChunk chunk, DataColumn column, SchemaElement tse, CancellationToken cancelToken) at Parquet.File.DataColumnWriter.WriteAsync(FieldPath fullPath, DataColumn column, CancellationToken cancelToken) at Parquet.ParquetRowGroupWriter.WriteColumnAsync(DataColumn column, Dictionary
2 custom Metadata, Can CellationTokenCancelToken) in Workers.ArchivingWorker.Jobs.ArchivingWebsocketLogsFull.WriteData(ParquetWriter writer, List
1 response) in /src/Workers.ArchivingWorker/Jobs/ArchivingWebsocketLogsFull.cs:line 285 in Workers.ArchivingWorker.Jobs.ArchivingWebsocketLogsFull.ExecuteAsync(IServiceProvider serviceProvider) in /src/Workers.ArchivingWorker/Jobs/ArchivingWebsocketLogsFull.cs:line 250
For that matter, tell me if I can add and manage the first rowGroup after it is closed or the end of adding columns?
Failed test
Error:
OverflowException: The array size has exceeded the supported range. in Microsoft.IO.RecyclableMemoryStream.ToArray() in /_/src/RecyclableMemoryStream.cs:line 820 in Parquet.File.DataColumnWriter.CompressAndWriteAsync(PageHeader ph, MemoryStream data, ColumnSizes cs, CancellationToken cancelToken) in Parquet.File.DataColumnWriter. WriteColumnAsync(ColumnChunk chunk, DataColumn column, SchemaElement tse, CancellationToken cancelToken) at Parquet.File.DataColumnWriter.WriteAsync(FieldPath fullPath, DataColumn column, CancellationToken cancelToken) at Parquet.ParquetRowGroupWriter.WriteColumnAsync(DataColumn column, Dictionary`2 custom Metadata, Can CellationTokenCancelToken) in Workers.ArchivingWorker.Jobs.ArchivingWebsocketLogsFull.WriteData(ParquetWriter writer, List`1 response) in /src/Workers.ArchivingWorker/Jobs/ArchivingWebsocketLogsFull.cs:line 285 in Workers.ArchivingWorker.Jobs.ArchivingWebsocketLogsFull.ExecuteAsync(IServiceProvider serviceProvider) in /src/Workers.ArchivingWorker/Jobs/ArchivingWebsocketLogsFull.cs:line 250
Send feedback Side panels
Sorry I don't understand the issue. Maybe providing a failing test will help i.e. "show me the code" ;)