[BUG] Error when opening file with list field (System.InvalidCastException)
Parquet Viewer Version Latest (3.1.0)
Where was the parquet file created? pyarrow
Sample File Generic Array
Describe the bug Upon loading a file i get hit with:
---------------------------
Specified cast is not valid.
---------------------------
Something went wrong (CTRL+C to copy):
System.InvalidCastException: Specified cast is not valid.
at ParquetViewer.Engine.ParquetEngine.ReadListField(DataTableLite dataTable, ParquetRowGroupReader groupReader, Int32 rowBeginIndex, ParquetSchemaElement itemField, Int32 fieldIndex, Int64 skipRecords, Int64 readRecords, Boolean isFirstColumn, CancellationToken cancellationToken, IProgress`1 progress)
at ParquetViewer.Engine.ParquetEngine.ProcessRowGroup(DataTableLite dataTable, ParquetRowGroupReader groupReader, Int64 skipRecords, Int64 readRecords, CancellationToken cancellationToken, IProgress`1 progress)
at ParquetViewer.Engine.ParquetEngine.PopulateDataTable(DataTableLite dataTable, ParquetReader parquetReader, Int64 offset, Int64 recordCount, CancellationToken cancellationToken, IProgress`1 progress)
at ParquetViewer.Engine.ParquetEngine.ReadRowsAsync(List`1 selectedFields, Int32 offset, Int32 recordCount, CancellationToken cancellationToken, IProgress`1 progress)
at ParquetViewer.MainForm.<>c__DisplayClass33_0.<<LoadFileToGridview>b__1>d.MoveNext()
--- End of stack trace from previous location ---
at ParquetViewer.MainForm.LoadFileToGridview()
at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_0(Object state)
at InvokeStub_SendOrPostCallback.Invoke(Object, Span`1)
at System.Reflection.MethodBaseInvoker.InvokeWithOneArg(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
---------------------------
OK
---------------------------
Can you share a sample file please? You can zip and upload it here. Otherwise I might not be able to help much.
Also give version 3.2.0.0 a try. I doubt it has addressed your issue but wanted to suggest just in case 🤞🏼 .
I believe this is due to list fields with nested struct values. Implementing it has proven challenging.
Maybe related. I'm, getting the following error. And yes, it is parquet with nested struct.
---------------------------
Something went wrong
---------------------------
Could not load parquet file.
If the problem persists please consider opening a bug ticket in the project repo: Help → About
ParquetViewer.Engine.Exceptions.FileReadException: Encountered an error reading file.
---> System.InvalidOperationException: don't know how to skip type Set
at Parquet.Meta.Proto.ThriftCompactProtocolReader.SkipField(CompactType compactType)
at Parquet.Meta.ColumnChunk.Read(ThriftCompactProtocolReader proto)
at Parquet.Meta.RowGroup.Read(ThriftCompactProtocolReader proto)
at Parquet.Meta.FileMetaData.Read(ThriftCompactProtocolReader proto)
at Parquet.ParquetActor.ReadMetadataAsync(CancellationToken cancellationToken)
at Parquet.ParquetReader.InitialiseAsync(CancellationToken cancellationToken)
at Parquet.ParquetReader.CreateAsync(String filePath, ParquetOptions parquetOptions, CancellationToken cancellationToken)
at ParquetViewer.Engine.ParquetEngine.OpenFileAsync(String parquetFilePath, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at ParquetViewer.Engine.ParquetEngine.OpenFileAsync(String parquetFilePath, CancellationToken cancellationToken)
at ParquetViewer.MainForm.OpenFieldSelectionDialog(Boolean forceOpenDialog)
@toomyem Which version of the app are you using? Have you tried v3.2.1.0 ?
Nope, still on 3.2.0 as it is latest released.
I just checked 3.2.1.0 - works much better and opens my file correctly 👍 Thank you.
After some testing, it looks like it is not handled 100% correct.
Example schema:
message test-msg {
required int32 data_required32;
required int64 data_required64;
optional binary data_optional_missing (UTF8);
optional binary data_optional_existing (UTF8);
repeated int32 data_repeated;
repeated group data_group {
required binary nested (UTF8);
}
required binary uuid (UTF8);
required group data_list (LIST) {
repeated group list {
required binary element (UTF8);
}
}
}
Test parquet file: test.zip Parquet file without repetition: test-no-repeat.zip
Error while opening:
---------------------------
Specified cast is not valid.
---------------------------
Something went wrong (CTRL+C to copy):
System.InvalidCastException: Specified cast is not valid.
at System.Data.Common.Int32Storage.Set(Int32 record, Object value)
at System.Data.DataColumn.set_Item(Int32 record, Object value)
Problem occurs when there is more than one repetition for field repeated int32 data_repeated;
Works:
record.add("data_repeated", 10);
Fails:
record.add("data_repeated", 10);
record.add("data_repeated", 20);
Moreover, required group data_list (LIST) is being read as empty list ([]):
Snippet used to create parquet file:
try (InputStream inputStream = MessageTypeProvider.class.getResourceAsStream("/test-msg.message")) {
String schema = new String(inputStream.readAllBytes(), StandardCharsets.UTF_8);
MessageType msgType = MessageTypeParser.parseMessageType(schema);
try (ParquetWriter<Group> writer = new ParquetWriterBuilder(Path.of("test.parquet"), msgType).withWriteMode(OVERWRITE).build()) {
Group record = new SimpleGroupFactory(msgType).newGroup();
record.add("uuid", UUID.randomUUID().toString());
record.add("data_required32", 32);
record.add("data_required64", 64L);
record.add("data_optional_existing", "hello");
record.add("data_repeated", 10);
record.add("data_repeated", 20);
Group data = record.addGroup("data_group");
data.add("nested", "nested!");
Group list = record.addGroup("data_list");
Group el1 = list.addGroup("list");
el1.add("element", "element1");
Group el2 = list.addGroup("list");
el2.add("element", "element2");
writer.write(record);
}
}
I'm using v3.2.1 and getting this error:
---------------------------
Unable to cast object of type 'ParquetViewer.Engine.Types.StructValue' to type 'ParquetViewer.Engine.Types.ListValue'.
---------------------------
Something went wrong (CTRL+C to copy):
System.InvalidCastException: Unable to cast object of type 'ParquetViewer.Engine.Types.StructValue' to type 'ParquetViewer.Engine.Types.ListValue'.
at ParquetViewer.Engine.ParquetEngine.ReadListField(DataTableLite dataTable, ParquetRowGroupReader groupReader, Int32 rowBeginIndex, ParquetSchemaElement itemField, Int32 fieldIndex, Int64 skipRecords, Int64 readRecords, Boolean isFirstColumn, CancellationToken cancellationToken, IProgress`1 progress)
at ParquetViewer.Engine.ParquetEngine.ProcessRowGroup(DataTableLite dataTable, ParquetRowGroupReader groupReader, Int64 skipRecords, Int64 readRecords, CancellationToken cancellationToken, IProgress`1 progress)
at ParquetViewer.Engine.ParquetEngine.PopulateDataTable(DataTableLite dataTable, ParquetReader parquetReader, Int64 offset, Int64 recordCount, CancellationToken cancellationToken, IProgress`1 progress)
at ParquetViewer.Engine.ParquetEngine.ReadRowsAsync(List`1 selectedFields, Int32 offset, Int32 recordCount, CancellationToken cancellationToken, IProgress`1 progress)
at ParquetViewer.MainForm.<>c__DisplayClass33_0.<<LoadFileToGridview>b__1>d.MoveNext()
--- End of stack trace from previous location ---
at ParquetViewer.MainForm.LoadFileToGridview()
at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_0(Object state)
at InvokeStub_SendOrPostCallback.Invoke(Object, Span`1)
at System.Reflection.MethodBaseInvoker.InvokeWithOneArg(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
---------------------------
OK
---------------------------
Opening a parquet file with following schema:
|--...
|-- experience: array (nullable = true)
| |-- element: struct (containsNull = false)
| | |-- duration: string (nullable = true)
| | |-- end_time: string (nullable = true)
| | |-- job_description: string (nullable = true)
| | |-- location: string (nullable = true)
| | |-- position: string (nullable = true)
| | |-- start_time: string (nullable = true)
| | |-- company: struct (nullable = false)
| | | |-- name: string (nullable = true)
| | | |-- image_url: string (nullable = true)
| | | |-- social_url: string (nullable = true)
|-- ...
The file does open when removing the nested struct.
Test parquet file: test.zip Parquet file without repetition: test-no-repeat.zip
I was able to address the issues with the data_list and data_repeated fields in v3.5.0. However the issue with data_group needs more investigation. I opened this issue in the parquet-dotnet repo to track it: https://github.com/aloneguid/parquet-dotnet/issues/681
Test parquet file: test.zip Parquet file without repetition: test-no-repeat.zip
I was able to address the issues with the
data_listanddata_repeatedfields in v3.5.0. However the issue withdata_groupneeds more investigation. I opened this issue in the parquet-dotnet repo to track it: aloneguid/parquet-dotnet#681
https://github.com/aloneguid/parquet-dotnet/issues/681 was fixed and released 🥳 So I updated v3.5.0 with the latest pre-release which addresses the remaining issue.
With all test file issues addressed I'm going to close this ticket out but feel free to open a new one, folks, with a new test file 🙏🏾