presto icon indicating copy to clipboard operation
presto copied to clipboard

Presto fails to write to Iceberg tables

Open dbw9580 opened this issue 3 years ago • 15 comments

Presto 0.254.1 Hive 2.3.8

presto:iceberg_test> create table person (name varchar, age int, id int) with (location = 'file:///home/dbw/testdb/person5/', format = 'parquet');
CREATE TABLE
presto:iceberg_test> insert into person values ('alice', 18, 1000);

Query 20210610_120138_00057_g5t6w, FAILED, 1 node
Splits: 19 total, 17 done (89.47%)
0:02 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20210610_120138_00057_g5t6w failed: org.apache.parquet.schema.Types$PrimitiveBuilder.as(Lorg/apache/parquet/schema/LogicalTypeAnnotation;)Lorg/apache/parquet/schema/Types$Builder;

Error stack

2021-06-10T20:01:40.447+0800    ERROR   remote-task-callback-43 com.facebook.presto.execution.StageExecutionStateMachine        Stage execution 20210610_120138_00057_g5t6w.1.0 failed
java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$PrimitiveBuilder.as(Lorg/apache/parquet/schema/LogicalTypeAnnotation;)Lorg/apache/parquet/schema/Types$Builder;
        at org.apache.iceberg.parquet.TypeToMessageType.primitive(TypeToMessageType.java:145)
        at org.apache.iceberg.parquet.TypeToMessageType.field(TypeToMessageType.java:88)
        at org.apache.iceberg.parquet.TypeToMessageType.convert(TypeToMessageType.java:65)
        at org.apache.iceberg.parquet.ParquetSchemaUtil.convert(ParquetSchemaUtil.java:41)
        at com.facebook.presto.iceberg.IcebergFileWriterFactory.createParquetWriter(IcebergFileWriterFactory.java:114)
        at com.facebook.presto.iceberg.IcebergFileWriterFactory.createFileWriter(IcebergFileWriterFactory.java:77)
        at com.facebook.presto.iceberg.IcebergPageSink.createWriter(IcebergPageSink.java:303)
        at com.facebook.presto.iceberg.IcebergPageSink.getWriterIndexes(IcebergPageSink.java:284)
        at com.facebook.presto.iceberg.IcebergPageSink.writePage(IcebergPageSink.java:212)
        at com.facebook.presto.iceberg.IcebergPageSink.doAppend(IcebergPageSink.java:207)
        at com.facebook.presto.iceberg.IcebergPageSink.lambda$appendPage$0(IcebergPageSink.java:146)
        at com.facebook.presto.hive.authentication.HdfsAuthentication.lambda$doAs$0(HdfsAuthentication.java:24)
        at com.facebook.presto.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:23)
        at com.facebook.presto.hive.authentication.HdfsAuthentication.doAs(HdfsAuthentication.java:23)
        at com.facebook.presto.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:86)
        at com.facebook.presto.iceberg.IcebergPageSink.appendPage(IcebergPageSink.java:146)
        at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSink.appendPage(ClassLoaderSafeConnectorPageSink.java:66)
        at com.facebook.presto.operator.TableWriterOperator.addInput(TableWriterOperator.java:324)
        at com.facebook.presto.operator.Driver.processInternal(Driver.java:428)
        at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301)
        at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722)
        at com.facebook.presto.operator.Driver.processFor(Driver.java:294)
        at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
        at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
        at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545)
        at com.facebook.presto.$gen.Presto_0_254_1_a67de6c____20210610_114751_1.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

dbw9580 avatar Jun 10 '21 12:06 dbw9580

Good catch, thank you @dbw9580 ! I guess it's caused by a class conflict from different versions of the parquet library.

This PR https://github.com/prestodb/presto-hive-apache/pull/47 might fix the issue. @aweisberg and I are making the release now.

beinan avatar Jun 10 '21 21:06 beinan

@beinan I'm building presto with the latest presto-hive-apache module. I'm getting duplicate classes during compilation:

[WARNING] Found duplicate and different classes in [com.facebook.presto.hive:hive-apache:3.0.0-5, org.apache.yetus:audience-annotations:0.11.0]:
[WARNING]   org.apache.yetus.audience.InterfaceAudience
[WARNING]   org.apache.yetus.audience.InterfaceStability
[WARNING]   org.apache.yetus.audience.tools.ExcludePrivateAnnotationsStandardDoclet
[WARNING]   org.apache.yetus.audience.tools.IncludePublicAnnotationsStandardDoclet
[WARNING]   org.apache.yetus.audience.tools.RootDocProcessor
[WARNING]   org.apache.yetus.audience.tools.StabilityOptions
[WARNING] Found duplicate classes/resources in compile classpath.

dbw9580 avatar Jun 15 '21 08:06 dbw9580

I managed to build the module by excluding org.apache.yetus:audience-annotations from presto-hive-apache's dependency on parquet-common:

<dependency>
    <groupId>org.apache.parquet</groupId>
    <artifactId>parquet-common</artifactId>
    <version>${dep.parquet.version}</version>
    <exclusions>
        <exclusion>
            <groupId>org.apache.yetus</groupId>
            <artifactId>audience-annotations</artifactId>
        </exclusion>
    </exclusions>
</dependency>

With the new module, I can write into the iceberg table, but cannot read back from it:

2021-06-15T21:01:16.535+0800    WARN    20210615_130113_00010_yp3r2.0.0.0-0-110 org.apache.iceberg.BaseTransaction      Failed to load committed metadata, skipping clean-up
com.facebook.presto.iceberg.UnknownTableTypeException: Not an Iceberg table: iceberg_test.person
        at com.facebook.presto.iceberg.HiveTableOperations.refresh(HiveTableOperations.java:182)
        at com.facebook.presto.iceberg.HiveTableOperations.current(HiveTableOperations.java:166)
        at org.apache.iceberg.BaseTransaction.committedFiles(BaseTransaction.java:404)
        at org.apache.iceberg.BaseTransaction.commitSimpleTransaction(BaseTransaction.java:376)
        at org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:220)
        at com.facebook.presto.iceberg.IcebergMetadata.finishInsert(IcebergMetadata.java:451)
        at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.finishInsert(ClassLoaderSafeConnectorMetadata.java:436)
        at com.facebook.presto.metadata.MetadataManager.finishInsert(MetadataManager.java:888)
        at com.facebook.presto.sql.planner.LocalExecutionPlanner.lambda$createTableFinisher$3(LocalExecutionPlanner.java:3149)
        at com.facebook.presto.operator.TableFinishOperator.getOutput(TableFinishOperator.java:289)
        at com.facebook.presto.operator.Driver.processInternal(Driver.java:418)
        at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301)
        at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722)
        at com.facebook.presto.operator.Driver.processFor(Driver.java:294)
        at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
        at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
        at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:599)
        at com.facebook.presto.$gen.Presto_0_256_SNAPSHOT_dd3e522____20210615_124215_1.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

I can see the data written to disk and the metadata files created. Somehow Presto fails to recognize it.

dbw9580 avatar Jun 15 '21 13:06 dbw9580

@dbw9580 what metadata-store are you using? could you also share the sql you're using to create the table? Thanks!

beinan avatar Jun 16 '21 01:06 beinan

@beinan Hi, I'm using Hive metastore, and the table is created with create table person (name varchar, age int, id int) with (location = 'file:///home/dbw/testdb/person5/', format = 'parquet');

dbw9580 avatar Jun 16 '21 01:06 dbw9580

Thank you @dbw9580 for your prompt reply! I will try to reproduce the bug and get back to you soon.

beinan avatar Jun 16 '21 01:06 beinan

looks like a couple of the table_params including table-type dropped after any insert

  1 | metadata_location          | file:///Users/beinan/w/tmp/metadata/000

01-6bd8b56b-a589-4003-babd-89591b793bd5.metadata.json 1 | previous_metadata_location | file:///Users/beinan/w/tmp/metadata/000 00-3023be96-f8b0-408f-b250-fe6f3a27b674.metadata.json 1 | transient_lastDdlTime | 1623823254 1 | totalSize | 0 1 | numFiles | 0 2 | totalSize | 0 2 | numFiles | 0 2 | metadata_location | file:///Users/beinan/w/tmp/metadata/000 01-2eab1433-d38e-43a2-862f-c52bd4896aa1.metadata.json 2 | previous_metadata_location | file:///Users/beinan/w/tmp/metadata/000 00-edf1393b-d9ba-4a10-8393-d5bc10c7bfa7.metadata.json 2 | transient_lastDdlTime | 1623823996 3 | metadata_location | file:///Users/beinan/w/tmp/metadata/000 00-d9db195c-3fb4-4bea-8fed-b3f5481900bc.metadata.json 3 | totalSize | 0 3 | table_type | iceberg 3 | numFiles | 0 3 | EXTERNAL | TRUE

I will post a PR to fix this issue very soon

beinan avatar Jun 16 '21 06:06 beinan

Presto 0.263-SNAPSHOT Hive 2.3.9 iceberg 0.11.1

I meet the issue again, upgrade parquet cannot solve my problem. This is a conflict for parquet-column-1.11.0.jar and hive-apache-3.0.0-3.jar, hive-apache-3.0.0-3 contain the class org.apache.parquet.schema.Types, but not have method org.apache.parquet.schema.Types.Builder#as(org.apache.parquet.schema.LogicalTypeAnnotation)

... presto:iceberg_test> INSERT INTO person VALUES ('alice', 18, 1000);

Query 20210922_074309_00001_5gc8t, FAILED, 1 node Splits: 19 total, 1 done (5.26%) 0:03 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20210922_074309_00001_5gc8t failed: org.apache.parquet.schema.Types$PrimitiveBuilder.as(Lorg/apache/parquet/schema/LogicalTypeAnnotation;)Lorg/apache/parquet/schema/Types$Builder;

Error log: 2021-09-22T16:59:32.980+0800 ERROR remote-task-callback-8 com.facebook.presto.execution.StageExecutionStateMachine Stage execution 20210922_085929_00001_6ta4c.1.0 failed java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$PrimitiveBuilder.as(Lorg/apache/parquet/schema/LogicalTypeAnnotation;)Lorg/apache/parquet/schema/Types$Builder; at org.apache.iceberg.parquet.TypeToMessageType.primitive(TypeToMessageType.java:145) at org.apache.iceberg.parquet.TypeToMessageType.field(TypeToMessageType.java:88) at org.apache.iceberg.parquet.TypeToMessageType.convert(TypeToMessageType.java:65) at org.apache.iceberg.parquet.ParquetSchemaUtil.convert(ParquetSchemaUtil.java:43) at com.facebook.presto.iceberg.IcebergFileWriterFactory.createParquetWriter(IcebergFileWriterFactory.java:150) at com.facebook.presto.iceberg.IcebergFileWriterFactory.createFileWriter(IcebergFileWriterFactory.java:111) at com.facebook.presto.iceberg.IcebergPageSink.createWriter(IcebergPageSink.java:299) at com.facebook.presto.iceberg.IcebergPageSink.getWriterIndexes(IcebergPageSink.java:283) at com.facebook.presto.iceberg.IcebergPageSink.writePage(IcebergPageSink.java:213) at com.facebook.presto.iceberg.IcebergPageSink.doAppend(IcebergPageSink.java:208) at com.facebook.presto.iceberg.IcebergPageSink.lambda$appendPage$0(IcebergPageSink.java:147) at com.facebook.presto.hive.authentication.HdfsAuthentication.lambda$doAs$0(HdfsAuthentication.java:24) at com.facebook.presto.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:23) at com.facebook.presto.hive.authentication.HdfsAuthentication.doAs(HdfsAuthentication.java:23) at com.facebook.presto.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:86) at com.facebook.presto.iceberg.IcebergPageSink.appendPage(IcebergPageSink.java:147) at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSink.appendPage(ClassLoaderSafeConnectorPageSink.java:66) at com.facebook.presto.operator.TableWriterOperator.addInput(TableWriterOperator.java:338) at com.facebook.presto.operator.Driver.processInternal(Driver.java:428) at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301) at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722) at com.facebook.presto.operator.Driver.processFor(Driver.java:294) at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077) at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162) at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:599) at com.facebook.presto.$gen.Presto_0_261_fd07867____20210922_085907_1.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

wandongchen avatar Sep 22 '21 09:09 wandongchen

Upgrade presto-hive-apache to 3.0.0-5 #16274 can solved

org.apache.parquet.schema.Types$PrimitiveBuilder.as(Lorg/apache/parquet/schema/LogicalTypeAnnotation;)Lorg/apache/parquet/schema/Types$Builder;

wandongchen avatar Sep 22 '21 11:09 wandongchen

Thank you @wandongchen for reporting this to us! I cannot merge my PR just because it would cause some other dependency issue which could not pass the CI(But functioning is ok). We are still working on that, https://github.com/prestodb/presto/pull/16545#issuecomment-937033816. looks like we're very close to fix the issue.

beinan avatar Oct 06 '21 20:10 beinan

@beinan @dbw9580 I encountered this issue too. What can I do for workaround?

maobaolong avatar Nov 03 '21 12:11 maobaolong

@beinan @dbw9580 I encountered this issue too. What can I do for workaround?

I don't remember precisely what I did to fix this, but maybe you can try cherry picking https://github.com/prestodb/presto/pull/16274 this to presto?

dbw9580 avatar Nov 03 '21 12:11 dbw9580

@maobaolong and @dbw9580 , we just upgraded presto-hive-apache to 3.0.0-7 which would fix the issue, could you try it with the most recent master? or cherry pick this one https://github.com/prestodb/presto/pull/16923

beinan avatar Nov 03 '21 18:11 beinan

I managed to build the module by excluding org.apache.yetus:audience-annotations from presto-hive-apache's dependency on parquet-common:

<dependency>
    <groupId>org.apache.parquet</groupId>
    <artifactId>parquet-common</artifactId>
    <version>${dep.parquet.version}</version>
    <exclusions>
        <exclusion>
            <groupId>org.apache.yetus</groupId>
            <artifactId>audience-annotations</artifactId>
        </exclusion>
    </exclusions>
</dependency>

With the new module, I can write into the iceberg table, but cannot read back from it:

2021-06-15T21:01:16.535+0800    WARN    20210615_130113_00010_yp3r2.0.0.0-0-110 org.apache.iceberg.BaseTransaction      Failed to load committed metadata, skipping clean-up
com.facebook.presto.iceberg.UnknownTableTypeException: Not an Iceberg table: iceberg_test.person
        at com.facebook.presto.iceberg.HiveTableOperations.refresh(HiveTableOperations.java:182)
        at com.facebook.presto.iceberg.HiveTableOperations.current(HiveTableOperations.java:166)
        at org.apache.iceberg.BaseTransaction.committedFiles(BaseTransaction.java:404)
        at org.apache.iceberg.BaseTransaction.commitSimpleTransaction(BaseTransaction.java:376)
        at org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:220)
        at com.facebook.presto.iceberg.IcebergMetadata.finishInsert(IcebergMetadata.java:451)
        at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.finishInsert(ClassLoaderSafeConnectorMetadata.java:436)
        at com.facebook.presto.metadata.MetadataManager.finishInsert(MetadataManager.java:888)
        at com.facebook.presto.sql.planner.LocalExecutionPlanner.lambda$createTableFinisher$3(LocalExecutionPlanner.java:3149)
        at com.facebook.presto.operator.TableFinishOperator.getOutput(TableFinishOperator.java:289)
        at com.facebook.presto.operator.Driver.processInternal(Driver.java:418)
        at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301)
        at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722)
        at com.facebook.presto.operator.Driver.processFor(Driver.java:294)
        at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
        at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
        at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:599)
        at com.facebook.presto.$gen.Presto_0_256_SNAPSHOT_dd3e522____20210615_124215_1.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

I can see the data written to disk and the metadata files created. Somehow Presto fails to recognize it.

I am facing the same problem like you did, [WARNING] Found duplicate and different classes in [com.facebook.presto.hive:hive-apache:2.3.7-dp6.44.1.1, org.apache.parquet:parquet-encoding:1.10.1]: [WARNING] org.apache.parquet.column.values.bitpacking.BaseBitPackingReader [WARNING] org.apache.parquet.column.values.bitpacking.BaseBitPackingWriter [WARNING] org.apache.parquet.column.values.bitpacking.BitPacking [WARNING] org.apache.parquet.column.values.bitpacking.ByteBasedBitPackingEncoder [WARNING] org.apache.parquet.column.values.bitpacking.ByteBitPackingBE [WARNING] org.apache.parquet.column.values.bitpacking.ByteBitPackingForLongBE [WARNING] org.apache.parquet.column.values.bitpacking.ByteBitPackingForLongLE [WARNING] org.apache.parquet.column.values.bitpacking.ByteBitPackingLE [WARNING] org.apache.parquet.column.values.bitpacking.BytePacker [WARNING] org.apache.parquet.column.values.bitpacking.BytePackerFactory [WARNING] org.apache.parquet.column.values.bitpacking.BytePackerForLong Which dependency should I add exclusion to? Can you tell me how to fix this? Any help will be appreciated. :)

Crossoverrr avatar Jul 14 '22 04:07 Crossoverrr

Hello @Crossoverrr , are you running presto with your custom build or master?

beinan avatar Jul 20 '22 05:07 beinan