ParquetViewer icon indicating copy to clipboard operation
ParquetViewer copied to clipboard

[BUG] Error when opening decimal values from SAP

Open beggsdl opened this issue 11 months ago • 6 comments

Parquet Viewer Version 3.2.1.0 SC

Where was the parquet file created? Thebald Xtract Universal

Sample File TCURR table from SAP

Sample data: MANDT;KURST;FCURR;TCURR;GDATU;UKURS;FFACT;TFACT 400;EOM;CAD;EUR;79889168;0.70720;0;0 400;EOM;USD;CAD;79889168;0.97860;0;0 400;EOM;USD;EUR;79889168;0.69200;0;0

Describe the bug The TCURR table in SAP has been extracted using Theobald Xtract Universal and written to a parquet file. When trying to open the resulting file in ParquetViewer, we get the error below. Is this an issue with the file creation, or with how ParquetViewer is reading the file?

Screenshots Here is the error:


scale must be less than or equal to the precision (Parameter 'scale')

Something went wrong (CTRL+C to copy): System.ArgumentException: scale must be less than or equal to the precision (Parameter 'scale') at Parquet.Schema.DecimalDataField..ctor(String name, Int32 precision, Int32 scale, Boolean forceByteArrayEncoding, Nullable1 isNullable, Nullable1 isArray, String propertyName) at Parquet.Encodings.SchemaEncoder.GetDecimalDataField(SchemaElement se) at Parquet.Encodings.SchemaEncoder.TryBuildDataField(SchemaElement se, ParquetOptions options, DataField& df) at Parquet.Encodings.SchemaEncoder.Decode(List1 schema, ParquetOptions options, Int32& index, Int32& ownedChildCount) at Parquet.File.ThriftFooter.CreateModelSchema(FieldPath path, IList1 container, Int32 childCount, Int32& si, ParquetOptions formatOptions) at Parquet.File.ThriftFooter.CreateModelSchema(ParquetOptions formatOptions) at Parquet.ParquetReader.get_Schema() at ParquetViewer.Engine.ParquetEngine.get_Schema() at ParquetViewer.MainForm.OpenFieldSelectionDialog(Boolean forceOpenDialog)

OK

beggsdl avatar Feb 10 '25 16:02 beggsdl

Can you open the file using other tools? Can you try https://www.parquet-viewer.com/ for example?

Are you also able to read the file back into the system you extracted it from? This would confirm the file is at least not malformed.

mukunku avatar Feb 11 '25 01:02 mukunku

parquet-viewer.com is blocked in our organization, but I did use the viewer at dataconverter.io. It did read the file successfully. I am not able to read the file back into the tool that created it because the only data source options are SAP objects.

Here are screenshots of the results from datacoverter.io:

Image

Image

Image

beggsdl avatar Feb 11 '25 10:02 beggsdl

Here is the TCURR parquet file. There isn't any sensitive data in it, just exchange rates. I have three "Compatibility mode" options when creating the file. Pure, Spark, and BigQuery. I get the same results with any of them. This file uses Pure.

TCURR Pure.zip

beggsdl avatar Feb 11 '25 16:02 beggsdl

Maybe it has something to do with the last three columns being defined in the schema as DECIMAL(9,x)? In another application we use, the file could not be read until we change the format of the last columns to FLOAT.

beggsdl avatar Feb 11 '25 17:02 beggsdl

This is all helpful info, thanks.

mukunku avatar Feb 12 '25 13:02 mukunku

Thanks again for all the details. As you eluded to the issue is because of the DECIMAL(9,0) fields in your file. It seems the Xtract Universal plugin you're using doesn't define the Scale of the decimals when creating the parquet file. It saves the precision as 9 but leaves the scale as null instead of 0.

I opened a PR in the Parquet.NET library we use to process parquet files with a fix: https://github.com/aloneguid/parquet-dotnet/pull/602

Lets see what they think; if we can get that PR merged I'll update ParquetViewer so you should be able to open your files.

mukunku avatar Mar 04 '25 15:03 mukunku

The PR to the parquet-dotnet repo has been merged and released! You can now use v3.4.0 to open your files 🙌🏼

Thanks for reporting this issue and sharing a sample file.

mukunku avatar Aug 14 '25 15:08 mukunku