delta-rs
delta-rs copied to clipboard
Handling of decimals in scientific notation
Environment
Delta-rs version: 0.17.0 Binding: Python
Bug
Reading parquet file with fields of decimal(32,16) type, some of the fields have zero values and calling write_deltalake results in error:
Exception: Parser error: can't parse the string value 0E-16 to decimal
According to 22171 if scale of decimal type is > 6 , 0 value will be shown in scientific notation.
What happened: write_deltalake writes delta to the disk and the decimal columns with 0 values are written as 0, however it is unclear if metadata are written correctly What you expected to happen: write_deltalake writes delta table without error
Same issue #2193
it appears zero values are successfully written if there are other records in the batch with non-zero values. but if there is only one record in the batch, the zero value is not written. and in both the cases, error is the same:
Parser error: can't parse the string value 0.0 to decimal.
This is caused by the upstream json parser in Arrow-RS not supporting scientific notations to be parsed
The decimal in scientific notation issue occurs not only when writing data, but as well when I try to read the schema from a delta table: I have the follwing code (see also here;
deltaTable = DeltaTable(file_uri, storage_options=deltalake_storage_options(storage))
pyarrow_schema = deltaTable.schema().to_pyarrow()
return pyarrow_schema_to_sqlalchemy_table(pyarrow_schema, name=table_name, schema=schema_name, metadata=metadata)
This results in the error Parser error: can't parse the string value 0E-16 to decimal :
This error did not happen in version 0.14.0, but since 0.15.0 I get this error in my code when the table has a decimal type.