spark-avro icon indicating copy to clipboard operation
spark-avro copied to clipboard

Support for logical datatypes like Decimal type

Open cpbhagtani opened this issue 8 years ago • 14 comments

Avro doesn't support very big numbers directly. It supports it through logicalTypes where you can specify value as string type but send the actual data type of the field as logicalType. Following is the example of a decimal type

{"type": "record", "name": "test", "fields" : [ {"name": "a","type": "string",   "logicalType": "decimal",   "precision": 4,   "scale": 2 }, {"name": "b", "type": "string"} ]}

This pull request add support for reading such datatype. For now I have added only decimal type, but we can add more logicalTypes. In dataframe it actually appears as decimal(4,2) type.

cpbhagtani avatar Mar 09 '16 05:03 cpbhagtani

Current coverage is 94.02%

Merging #121 into master will increase coverage by +0.57% as of bb290a5

@@            master    #121   diff @@
======================================
  Files            6       6       
  Stmts          275     318    +43
  Branches        45      60    +15
  Methods          0       0       
======================================
+ Hit            257     299    +42
  Partial          0       0       
- Missed          18      19     +1

Review entire Coverage Diff as of bb290a5

Powered by Codecov. Updated on successful CI builds.

codecov-io avatar Mar 09 '16 05:03 codecov-io

In addition to adding support for DateType, this fix is important to be able to read and write logical types in spark-bigquery.

To fully incorporate logical types, the logicalType attribute should be set for all logical types in convertTypeToAvro() when building Avro schema.

ghost avatar Oct 26 '16 07:10 ghost

@cpbhagtani Do you have any plans to add support for bytes/decimal types soon?

i.e. to support this schema:

{
  "type" : "record",
  "name" : "avro_decimal1",
  "namespace" : "default",
  "fields" : [ {
    "name" : "dec_col1",
    "type" : [ "null", {
      "type" : "bytes",             <-----  bytes type instead of string type
      "logicalType" : "decimal",    <-----
      "precision" : 38,
      "scale" : 35
    } ],
    "default" : null
  } ]
}

progrexor avatar Nov 30 '16 10:11 progrexor

@progrexor , No currently we are sending decimal as string type with logical type.

cpbhagtani avatar Nov 30 '16 11:11 cpbhagtani

@cpbhagtani

Any plans on updating this code compatible to updated version of spark-avro.

i.e Resolve the branch conflicts to support spark-avro 3.2.0

karthikkadiyam avatar Mar 30 '17 20:03 karthikkadiyam

I am also interested in this issue. You decided on correct approach of fixes in pull request and only problem is that they currently do not merge cleanly?

hsn10 avatar Apr 02 '17 20:04 hsn10

Hi,

Sorry for asking same question again. Do you have any plans to add support for bytes/decimal types soon?

Thanks, Alind

alind-billore avatar Apr 05 '17 13:04 alind-billore

@karthikkadiyam I will make my PR compatible with 3.2.0 soon

cpbhagtani avatar Apr 05 '17 14:04 cpbhagtani

@hsn10 my PR is compatible with branch 2.0. I will try to make it compatible with 3.0

cpbhagtani avatar Apr 05 '17 14:04 cpbhagtani

@alind-billore decimal is already supported in my PR through avro logical type. bytes is not supported.

cpbhagtani avatar Apr 05 '17 14:04 cpbhagtani

@cpbhagtani Thanks a lot for your reply ! :)

alind-billore avatar Apr 06 '17 07:04 alind-billore

@cpbhagtani Any update on when can we expect this PR to be compatible with 3.2 . I tried it myself but ran through some runtime exceptions and couldn't fix them.

karthikkadiyam avatar May 23 '17 15:05 karthikkadiyam

In Avro spec:

A decimal logical type annotates Avro bytes or fixed types

@cpbhagtani Can you update this PR? I will follow it.

gengliangwang avatar Nov 06 '17 12:11 gengliangwang

Should also link to following, related, more recent PR (even if closed): https://github.com/databricks/spark-avro/pull/291 (Even that PR is closed, since Spark 2.4.0 provided avro logicalTypes support) this can still be useful for people that are bound to an older spark version

lhoss avatar Dec 07 '18 15:12 lhoss