presto-hive-apache icon indicating copy to clipboard operation
presto-hive-apache copied to clipboard

Upgrade to parquet 1.8.1

Open zhenxiao opened this issue 8 years ago • 6 comments

@dain @electrum here is my draft. Need to update presto parquet code namespaces when a new presto-hive-apache version is released

zhenxiao avatar Mar 08 '16 02:03 zhenxiao

This looks good, but I like to have the Presto changes ready first instead so that we know if anything is missing. I made most of the changes here in this branch so that it compiles:

https://github.com/electrum/presto/tree/parquet

With the exception of TestParquetTimestampUtils which uses NanoTimeUtils from Hive which references the old Parquet classes. I'm wondering if there are other classes in Hive that we use which will need the old classes (and would fail with ClassNotFoundException or similar).

electrum avatar Jul 13 '16 16:07 electrum

There are other parts of the code that imports hive's parquet classes so this change should be tested properly to see whether we hit old/new parquet package/class conflicts.

nyigitbasi@lgml-nyigitbasi /tmp/presto (master) $ grep -r org.apache.hadoop.hive.ql.io.parquet --include \*.java *
presto-hive/src/main/java/com/facebook/presto/hive/HiveStorageFormat.java:import org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat;
presto-hive/src/main/java/com/facebook/presto/hive/HiveStorageFormat.java:import org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat;
presto-hive/src/main/java/com/facebook/presto/hive/HiveStorageFormat.java:import org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe;
presto-hive/src/main/java/com/facebook/presto/hive/HiveUtil.java:import org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat;
presto-hive/src/main/java/com/facebook/presto/hive/HiveUtil.java:import org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe;
presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetPageSourceFactory.java:            .add("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe")
presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetRecordCursorProvider.java:            .add("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe")
presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetTimestampUtils.java: * This class is equivalent of @see org.apache.hadoop.hive.ql.io.parquet.timestamp.NanoTime,
presto-hive/src/test/java/com/facebook/presto/hive/parquet/ParquetTester.java:import org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat;
presto-hive/src/test/java/com/facebook/presto/hive/parquet/ParquetTester.java:import org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe;
presto-hive/src/test/java/com/facebook/presto/hive/parquet/TestParquetTimestampUtils.java:import org.apache.hadoop.hive.ql.io.parquet.timestamp.NanoTimeUtils;
presto-hive/src/test/java/com/facebook/presto/hive/TestHiveFileFormats.java:import org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat;
presto-hive/src/test/java/com/facebook/presto/hive/TestHiveFileFormats.java:import org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe;

nezihyigitbasi avatar Jul 13 '16 18:07 nezihyigitbasi

@electrum @zhenxiao I am currently looking at upgrading presto to parquet 1.8.1 and hit some issues during runtime. Presto's hive-apache package depends on hive 1.2 and that version still uses the old parquet packages (parquet.*), which causes these runtime issues.

For example, in my tests when querying a parquet table presto creates an instance of org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat and that class tries loading parquet.hadoop.ParquetInputFormat, which it cannot find. How do we want to handle this? Do we want to upgrade to hive 2.0, which uses the new parquet packages, or do we want to handle it some other way?

nezihyigitbasi avatar Aug 04 '16 20:08 nezihyigitbasi

@nezihyigitbasi MapredParquetInputFormat is a simple class. Might be easiest to fork it? Or bypass it entirely? Given we have a native Parquet reader, it might not be needed anymore.

electrum avatar Oct 17 '16 18:10 electrum

+1

ducky427 avatar Mar 02 '17 11:03 ducky427

@nezihyigitbasi -- what's the status of this PR?

drdee avatar Sep 01 '17 03:09 drdee