spark-avro
spark-avro copied to clipboard
Support for logical datatypes like Decimal type
Avro doesn't support very big numbers directly. It supports it through logicalTypes where you can specify value as string type but send the actual data type of the field as logicalType. Following is the example of a decimal type
{"type": "record", "name": "test", "fields" : [ {"name": "a","type": "string", "logicalType": "decimal", "precision": 4, "scale": 2 }, {"name": "b", "type": "string"} ]}
This pull request add support for reading such datatype. For now I have added only decimal type, but we can add more logicalTypes. In dataframe it actually appears as decimal(4,2) type.
Current coverage is 94.02%
Merging #121 into master will increase coverage by +0.57% as of
bb290a5
@@ master #121 diff @@
======================================
Files 6 6
Stmts 275 318 +43
Branches 45 60 +15
Methods 0 0
======================================
+ Hit 257 299 +42
Partial 0 0
- Missed 18 19 +1
Review entire Coverage Diff as of
bb290a5
Powered by Codecov. Updated on successful CI builds.
In addition to adding support for DateType, this fix is important to be able to read and write logical types in spark-bigquery.
To fully incorporate logical types, the logicalType attribute should be set for all logical types in convertTypeToAvro() when building Avro schema.
@cpbhagtani
Do you have any plans to add support for bytes/decimal
types soon?
i.e. to support this schema:
{
"type" : "record",
"name" : "avro_decimal1",
"namespace" : "default",
"fields" : [ {
"name" : "dec_col1",
"type" : [ "null", {
"type" : "bytes", <----- bytes type instead of string type
"logicalType" : "decimal", <-----
"precision" : 38,
"scale" : 35
} ],
"default" : null
} ]
}
@progrexor , No currently we are sending decimal as string type with logical type.
@cpbhagtani
Any plans on updating this code compatible to updated version of spark-avro.
i.e Resolve the branch conflicts to support spark-avro 3.2.0
I am also interested in this issue. You decided on correct approach of fixes in pull request and only problem is that they currently do not merge cleanly?
Hi,
Sorry for asking same question again. Do you have any plans to add support for bytes/decimal types soon?
Thanks, Alind
@karthikkadiyam I will make my PR compatible with 3.2.0 soon
@hsn10 my PR is compatible with branch 2.0. I will try to make it compatible with 3.0
@alind-billore decimal is already supported in my PR through avro logical type. bytes is not supported.
@cpbhagtani Thanks a lot for your reply ! :)
@cpbhagtani Any update on when can we expect this PR to be compatible with 3.2 . I tried it myself but ran through some runtime exceptions and couldn't fix them.
In Avro spec:
A decimal logical type annotates Avro bytes or fixed types
@cpbhagtani Can you update this PR? I will follow it.
Should also link to following, related, more recent PR (even if closed): https://github.com/databricks/spark-avro/pull/291 (Even that PR is closed, since Spark 2.4.0 provided avro logicalTypes support) this can still be useful for people that are bound to an older spark version