spark-avro icon indicating copy to clipboard operation
spark-avro copied to clipboard

TimeUnit conversion for BigQuery compatibility

Open devlucasc opened this issue 7 years ago • 3 comments

These changes allow to set option that make conversion of timestamp in long format using a specified precision, for example, microseconds. The setting of precision option can make avro compatible with BigQuery format, that reads timestamp in microseconds.

devlucasc avatar Mar 04 '18 03:03 devlucasc

Codecov Report

Merging #271 into master will increase coverage by 0.09%. The diff coverage is 100%.

@@            Coverage Diff            @@
##           master    #271      +/-   ##
=========================================
+ Coverage   92.21%   92.3%   +0.09%     
=========================================
  Files           5       5              
  Lines         321     325       +4     
  Branches       43      48       +5     
=========================================
+ Hits          296     300       +4     
  Misses         25      25

codecov-io avatar Mar 04 '18 03:03 codecov-io

Hi @devlucasc , In my opinion:

  1. If we should add this option, we need to add it for all the data sources, instead of just for AVRO.
  2. We can use .map or SQL statement to change the output timestamp unit, which is simple and straight forward to me.

gengliangwang avatar Jun 01 '18 21:06 gengliangwang

Hi @gengliangwang

  1. The problem only occur in avro data sources
  2. Map is not good when has a complex structure as arrays, structs or array of structs that contains a timestamp field. In this case is needed to make map, flatmap and explode to treat.

Kind regards.

devlucasc avatar Jun 12 '18 20:06 devlucasc