ParquetSharp
ParquetSharp copied to clipboard
Support writing arbitrary precision decimal columns
Hello I am trying to copy data from MS SQL table with NUMERIC(18,9) column "num" to Redshift with NUMERIC(18,9) column. When I try to load data data into Redshift I get an error: ERROR [XX000] [Amazon][Amazon Redshift] (30) Error occurred while trying to execute a query: [SQLState XX000] ERROR: Spectrum Scan Error DETAIL:
error: Spectrum Scan Error
code: 15001
context: File 'https://s3.us-east-1.amazonaws.com/xyz/new_line.parquet' column 'num' has an invalid type length. Expecting: 8 len in file: 13
query: 50511242
location: dory_util.cpp:1141
process: worker_thread [pid=12646]
When I change the destination column precision to 29, I was able to load data. I would like to have destination table precision the same as source table. Thanks, Dmitriy
I'm not familiar with Redshift but this documentation suggests it's possible to load data from a Parquet file that uses a different decimal precision by using an external table definition: https://aws.amazon.com/premiumsupport/knowledge-center/redshift-spectrum-data-errors/
It would be a fair amount of work to implement support for arbitrary precision decimal columns in ParquetSharp and that's not currently a high priority feature.
"To resolve the invalid type length error in Redshift Spectrum, use an external table definition. The table definition must match the "precision" and "scale" values defined in the external file. For example:
create external table ext_schema.tablename (c1 int, c2 decimal (6,2)) stored as PARQUET location 's3://.../.../'; In this example, the updated values (in the c2 decimal column) for "precision" and "scale" values are set to 6 and 2, respectively. Therefore, the CREATE EXTERNAL TABLE definition values listed in the c2 column must match the values defined in the Apache Parquet file."
This is not a critical issue because I am loading data into staging tables, but looks like other libraries can do this without a problem. Can you please put this issue on your ToDo list? Thanks, Dmitriy
Yes I will leave this issue open to track this feature.
Hello Any updates on decimal columns?
Hi @dburtsev, no sorry, I can't give an update on when this will be fixed yet. There's been some small improvements to decimal data reading by allowing to read decimals stored as int32 or int64, but no progress on the write side.
I just ran into this and hand-jammed decimals to be 29 and I'm still scratching my head over they whys and wherefores
new PrimitiveNode("Tot_Drug_Cst", Repetition.Optional, LogicalType.Decimal(29, 2), PhysicalType.ByteArray)
I made a start on fixing this a while ago but it dropped off my priorities sorry. I think I had it mostly working so will try to pick this up again soon.