data-validation icon indicating copy to clipboard operation
data-validation copied to clipboard

planned support for hive table?

Open zhaiyuyong opened this issue 5 years ago • 2 comments

zhaiyuyong avatar Mar 11 '19 14:03 zhaiyuyong

@zhaiyuyong TFDV uses Apache Beam for reading input data. Beam Python currently doesn't support reading Hive table out of the box. There are two possible options currently:

  1. Export your hive table as a CSV/tfrecord file and then use TFDV.
  2. Write a custom Beam transform to read hive table and decode it to TFDV's inmemory dictionary representation. Follow the instructions here to construct a pipeline with a custom decoder.

paulgc avatar Mar 20 '19 21:03 paulgc

@katsiapis @aaltay

paulgc avatar Mar 20 '19 21:03 paulgc