parquet-dotnet
parquet-dotnet copied to clipboard
Exception "cannot find data type handler to create model schema for [n: CONTACT_ID, t: BYTE_ARRAY, ct: DECIMAL, rt: OPTIONAL, c: 0]"
Version: Parquet.Net v3.9.1
Runtime Version: .Net 5.0
OS: Windows 10
Expected behavior
I would expect Parquet.Net to identify the DataTypeHandler appropriately for the column in question.
Actual behavior
Parquet.Net is throwing the aforementioned exception when I access parquetReader.Schema.
From ParquetViewer, the column in question is:
"Schema": [
{
"Field_id": 0,
"Name": "CONTACT_ID",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": null,
"Scale": 4,
"Precision": 23,
"Repetition_type": "OPTIONAL",
"Converted_type": "DECIMAL"
},
Steps to reproduce the behavior
- Step 1 - open a
ParquetReaderagainst the file. - Step 2 - access
ParquetReader.Schemawhen one of the columns is as described above
Code snippet reproducing the behavior
using (Stream fileStream = System.IO.File.OpenRead(fileName))
{
using (var parquetReader = new ParquetReader(fileStream))
{
Schema schema = parquetReader.Schema;
}
}
I am unfortunately unable to share a copy of the file in question. If there is a tool that will allow me to extract a subset easily, I could likely share a file with just this column.
Sorry to pester, is this a known issue? Am I perhaps doing something wrong? Cheers
This seems like a newer parquet format addition. Your column is using variable-size byte array to represent decimals. The specification allows for 4 representations, and parquet.net implements the first 3.
No need to attach test files, but I'd appreciate validating the fix when you have a chance.
Sorry for the delay. I updated to 4.2.2 and still am receiving the error.
"Exception": {
"ClassName": "System.InvalidOperationException",
"Message": "cannot find data type handler to create model schema for [n: CONTACT_ID, t: BYTE_ARRAY, ct: DECIMAL, rt: OPTIONAL, c: 0]",
"Data": null,
"InnerException": null,
"HelpURL": null,
"StackTraceString": " at Parquet.File.ThriftFooter.CreateModelSchema(FieldPath path, IList`1 container, Int32 childCount, Int32& si, ParquetOptions formatOptions)\r\n at Parquet.File.ThriftFooter.CreateModelSchema(ParquetOptions formatOptions)\r\n at Parquet.ParquetReader.get_Schema()\r\n
Not ready yet )
My bad, I'll wait patiently :)
Thanks @jchristn. This is supported in 4.2.3 if you could validate and confirm. I've used one of the official test data files from parquet repo to validate this, however couldn't force Spark to write one like that, so wondering what system did produce data in this format?
Hi @aloneguid I'm not sure what system was used to create the file :( When I try with v4.2.3, this is displayed to the console: RUNTIME::: win10-x64 SEARCHPATH::: and after, it seems to work.
Some debug logging to remove )
There still seems to be an issue with one file in particular, any way I could PM you or email with more details?
Email me [email protected].
Issue resolved in v4.3.3