bigflow
bigflow copied to clipboard
Hive Support
Read/write InputFormat/OutputFormat, SerDe from/to Hive Metastore. Read/write data from/to Hive Table or Partition.
Do you have any plans or thoughts to support Hive IO formats?
If we are running on Spark, we can delegate the read path to Spark. However we have to implement all the Hive IO formats in cpp when running on local or writing data to Hive, which would be a quite complex project.
Another way would be delegating our work to hcatalog, that looks like a good direction.
Yes, hcatalog is a good idea. But we need to test whether hcatalog supports the parquert file. Sometimes we want to specify the data storage format.