trino icon indicating copy to clipboard operation
trino copied to clipboard

WIP/RFC - - DELETE and UPDATE built on MERGE machinery

Open djsstarburst opened this issue 2 years ago • 10 comments

Description

This WIP PR supports SQL DELETE and SQL UPDATE using the back-end MERGE machinery. This PR is submitted to run the tests in the CI build to see what's broken.

Is this change a fix, improvement, new feature, refactoring, or other?

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

How would you describe this change to a non-technical end user or system administrator?

Related issues, pull requests, and links

  • Fixes #13621

Documentation

( ) No documentation is needed. ( ) Sufficient documentation is included in this PR. ( ) Documentation PR is available with #prnumber. ( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required. ( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

djsstarburst avatar Aug 30 '22 18:08 djsstarburst

Cassandra needs to implement getMergeRowIdColumnHandle

electrum avatar Sep 15 '22 18:09 electrum

This is a real failure:

Caused by: java.lang.IllegalArgumentException: Unsupported local exchange partitioning MERGE [insert = hive:HivePartitioningHandle{buckets=1, hiveTypes=[]}]
	at io.trino.operator.exchange.LocalExchange.computeBufferCount(LocalExchange.java:366)
	at io.trino.operator.exchange.LocalExchange.<init>(LocalExchange.java:98)
	at io.trino.sql.planner.LocalExecutionPlanner$Visitor.createLocalExchange(LocalExecutionPlanner.java:3573)
	at io.trino.sql.planner.LocalExecutionPlanner$Visitor.visitExchange(LocalExecutionPlanner.java:3459)
	at io.trino.sql.planner.LocalExecutionPlanner$Visitor.visitExchange(LocalExecutionPlanner.java:847)
	at io.trino.sql.planner.plan.ExchangeNode.accept(ExchangeNode.java:243)
	at io.trino.sql.planner.LocalExecutionPlanner$Visitor.visitMergeWriter(LocalExecutionPlanner.java:3376)
	at io.trino.sql.planner.LocalExecutionPlanner$Visitor.visitMergeWriter(LocalExecutionPlanner.java:847)
	at io.trino.sql.planner.plan.MergeWriterNode.accept(MergeWriterNode.java:102)
	at io.trino.sql.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:569)
	at io.trino.sql.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:492)
	at io.trino.execution.SqlTaskExecutionFactory.create(SqlTaskExecutionFactory.java:77)
	at io.trino.execution.SqlTask.updateTask(SqlTask.java:438)
	at io.trino.execution.SqlTaskManager.doUpdateTask(SqlTaskManager.java:498)
	at io.trino.execution.SqlTaskManager.lambda$updateTask$9(SqlTaskManager.java:453)
	at io.trino.$gen.Trino_395_137_g3a8fc2e____20220915_192610_2.call(Unknown Source)
	at io.trino.execution.SqlTaskManager.updateTask(SqlTaskManager.java:453)
	at io.trino.server.TaskResource.createOrUpdateTask(TaskResource.java:151)
	at jdk.internal.reflect.GeneratedMethodAccessor532.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)

electrum avatar Sep 15 '22 20:09 electrum

We might need some logic in FaultTolerantQueryScheduler#createBucketToPartitionMap to create the bucketToPartitionMap for the "Bucket to partition must be set before a partition function can be created" error. Though it would be helpful if someone more familiar with this like @dain or @arhimondr could look at why this occurs.

Caused by: java.lang.IllegalArgumentException: Bucket to partition must be set before a partition function can be created
	at io.trino.sql.planner.NodePartitioningManager.lambda$getPartitionFunction$0(NodePartitioningManager.java:83)
	at java.base/java.util.Optional.orElseThrow(Optional.java:403)
	at io.trino.sql.planner.NodePartitioningManager.getPartitionFunction(NodePartitioningManager.java:83)
	at io.trino.sql.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:531)
	at io.trino.execution.SqlTaskExecutionFactory.create(SqlTaskExecutionFactory.java:77)
	at io.trino.execution.SqlTask.updateTask(SqlTask.java:438)
	at io.trino.execution.SqlTaskManager.doUpdateTask(SqlTaskManager.java:498)
	at io.trino.execution.SqlTaskManager.lambda$updateTask$9(SqlTaskManager.java:453)
	at io.trino.$gen.Trino_testversion____20220915_203802_36.call(Unknown Source)
	at io.trino.execution.SqlTaskManager.updateTask(SqlTaskManager.java:453)
	at io.trino.server.TaskResource.createOrUpdateTask(TaskResource.java:151)
	at jdk.internal.reflect.GeneratedMethodAccessor575.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)

electrum avatar Sep 15 '22 20:09 electrum

Delete the overridden testDelete and testUpdate methods from BaseIcebergFailureRecoveryTest. We can use the base ones nows.

electrum avatar Sep 16 '22 01:09 electrum

Delete the overridden testDelete and testUpdate methods from BaseIcebergFailureRecoveryTest. We can use the base ones nows.

Removed

djsstarburst avatar Sep 16 '22 01:09 djsstarburst

There still are a bunch of test failures in the delta lake connector, and I think they reflect problems in the engine implementation, revolving around partition columns and values, and predicate pushdown:

  • testDeletePushdown and testUpdatePushdown are failing because they test that the count of rows read is a specific number, and the number is larger with the new implementation, because pushdown of the MergeWriterNode is failing. The match in PushMergeWriterDeleteIntoConnector is not succeeding because there is a filter node between the project and the table scan. I'm unsure how this test works
  • testUpdateOnPartitionKey is failing because the value of the partition column is null in a row processed by MergeWriterOperator. I'm unsure how this is supposed to be handled.
  • testUpdateWithPartitionKeyPredicate is failing because one more row was updated than should have been updated. I think the real problem was that the scan was supposed be restricted to a partition, but that info got lost.
  • All the testXXXVacuum tests are failing because it expects a specific set of files to be created, and finds one more than expected.
  • testTargetedDeleteWhenTableIsPartitionedWithColumnContainingSpecialCharacters is failing, and it really is because of the special characters, since removing them makes the test pass. I think this must mean that some partition data structure is not properly quoting or escaping the column.

djsstarburst avatar Sep 16 '22 17:09 djsstarburst

All the fault-tolerant delete and update tests are failing for the TASK type because a partition-related exception is raised before the expected injected exception. @electrum pinged the fault-tolerant folks for suggestions.

The only other Iceberg test that is failing is BaseIcebergConnectorSmokeTest.testDeleteRowsConcurrently. The error is:

Found new conflicting delete files that can apply to records matching true: [s3://test-iceberg-minio-smoke-test-t3060os0aq/tpch_orc/test_concurrent_delete4dhgtz3qug-2bf0653f48824875996a3f13c6cadc5b/data/20220916_181000_00013_8kkzy-6128b746-a3cf-461e-8b96-dc8c3d9d664f.orc]

djsstarburst avatar Sep 16 '22 18:09 djsstarburst

I pushed a commit that fixed all of the fault-tolerant tests by adding a method to MergePartitioningHandle to produce the FaultTolerantPartitioningScheme, and another to produce the TaskSource needed by the FaultTolerantPartitioningQueryFactory. With those changes, and removal of a couple of test bandaids in BaseDeltaFailureRecoveryTest all the fault-tolerant tests now pass.

Below is the complete list of currently-failing tests, including duplicates. Here is the summary for each connector:

  • Hive. All tests pass.
  • Raptor. All tests pass.
  • Iceberg. One failure - - testDeleteRowsConcurrently
  • Delta lake. There are bunch of failures. Many are caused by failed delete or predicate pushdown. Some are caused by expecting specific data files to be generated. I'm not sure if these are valid. These all need to be debugged.
  • Kudu. Lots of kudu tests are failing, all with messages like Column (name: nationkey, index: 0) is of type int64 but was requested as a type [Type: string]. Something is messed up with Kudu merge rowids, and I suspect fixing that problem will fix lots of Kudu tests.

The complete inventory of current failures, with lots of duplicates:

./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeUpdate.testUpdateOnPartitionKey
./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeUpdate.testUpdateWithPartitionKeyPredicate
./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeConnectorSmokeTest.testVacuum
./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeDatabricksConnectorTest.testAddColumnAndVacuum
./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.deltalake.TestPredicatePushdown.testDeletePushdown
./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.deltalake.TestPredicatePushdown.testUpdatePushdown
./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeOssDeltaLakeConnectorTest.testAddColumnAndVacuum
./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeDelete.testDeleteAllDatabricks
./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeDelete.testDeleteAllOssDeltaLake
./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeDelete.testDeleteOnPartitionKey
./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeDelete.testTargetedDeleteWhenTableIsPartitionedWithColumnContainingSpecialCharacters
./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeLegacyWriterConnectorSmokeTest.testVacuum

./test (plugintrino-delta-lake)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeUpdate.testUpdateOnPartitionKey
./test (plugintrino-delta-lake)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeUpdate.testUpdateWithPartitionKeyPredicate
./test (plugintrino-delta-lake)/8_Maven Tests.txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeConnectorSmokeTest.testVacuum
./test (plugintrino-delta-lake)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeDatabricksConnectorTest.testAddColumnAndVacuum
./test (plugintrino-delta-lake)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.deltalake.TestPredicatePushdown.testDeletePushdown
./test (plugintrino-delta-lake)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.deltalake.TestPredicatePushdown.testUpdatePushdown
./test (plugintrino-delta-lake)/8_Maven Tests.txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeOssDeltaLakeConnectorTest.testAddColumnAndVacuum
./test (plugintrino-delta-lake)/8_Maven Tests.txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeDelete.testDeleteAllDatabricks
./test (plugintrino-delta-lake)/8_Maven Tests.txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeDelete.testDeleteAllOssDeltaLake
./test (plugintrino-delta-lake)/8_Maven Tests.txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeDelete.testDeleteOnPartitionKey
./test (plugintrino-delta-lake)/8_Maven Tests.txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeDelete.testTargetedDeleteWhenTableIsPartitionedWithColumnContainingSpecialCharacters
./test (plugintrino-delta-lake)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeLegacyWriterConnectorSmokeTest.testVacuum

./28_test (plugintrino-iceberg).txt	ERROR	pool-4-thread-1	[TEST FAILURE] io.trino.plugin.iceberg.TestIcebergMinioOrcConnectorSmokeTest.testDeleteRowsConcurrently
./28_test (plugintrino-iceberg).txt	ERROR	pool-4-thread-1	[TEST FAILURE] io.trino.plugin.iceberg.TestIcebergConnectorSmokeTest.testDeleteRowsConcurrently
./28_test (plugintrino-iceberg).txt	ERROR	pool-4-thread-1	[TEST FAILURE] io.trino.plugin.iceberg.TestIcebergMinioParquetConnectorSmokeTest.testDeleteRowsConcurrently
./28_test (plugintrino-iceberg).txt	ERROR	pool-4-thread-1	[TEST FAILURE] io.trino.plugin.iceberg.TestIcebergMinioAvroConnectorSmokeTest.testDeleteRowsConcurrently

./test (plugintrino-iceberg)/8_Maven Tests.txt	ERROR	pool-4-thread-1	[TEST FAILURE] io.trino.plugin.iceberg.TestIcebergMinioOrcConnectorSmokeTest.testDeleteRowsConcurrently
./test (plugintrino-iceberg)/8_Maven Tests.txt	ERROR	pool-4-thread-1	[TEST FAILURE] io.trino.plugin.iceberg.TestIcebergConnectorSmokeTest.testDeleteRowsConcurrently
./test (plugintrino-iceberg)/8_Maven Tests.txt	ERROR	pool-4-thread-1	[TEST FAILURE] io.trino.plugin.iceberg.TestIcebergMinioParquetConnectorSmokeTest.testDeleteRowsConcurrently
./test (plugintrino-iceberg)/8_Maven Tests.txt	ERROR	pool-4-thread-1	[TEST FAILURE] io.trino.plugin.iceberg.TestIcebergMinioAvroConnectorSmokeTest.testDeleteRowsConcurrently

./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduLatestWithEmptyInferSchemaConnectorSmokeTest.testDeleteAllDataFromTable
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduLatestWithDisabledInferSchemaConnectorSmokeTest.testDeleteAllDataFromTable
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduLatestWithStandardInferSchemaConnectorSmokeTest.testDeleteAllDataFromTable
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduWithDisabledInferSchemaConnectorSmokeTest.testDeleteAllDataFromTable
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduWithEmptyInferSchemaConnectorSmokeTest.testDeleteAllDataFromTable
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduWithStandardInferSchemaConnectorSmokeTest.testDeleteAllDataFromTable
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduLatestWithDisabledInferSchemaConnectorSmokeTest.testRowLevelDelete
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduLatestWithStandardInferSchemaConnectorSmokeTest.testRowLevelDelete
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduLatestWithEmptyInferSchemaConnectorSmokeTest.testRowLevelDelete
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduWithEmptyInferSchemaConnectorSmokeTest.testRowLevelDelete
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduWithDisabledInferSchemaConnectorSmokeTest.testRowLevelDelete
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduWithStandardInferSchemaConnectorSmokeTest.testRowLevelDelete
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduLatestWithEmptyInferSchemaConnectorSmokeTest.testUpdate
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduLatestWithDisabledInferSchemaConnectorSmokeTest.testUpdate
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduLatestWithStandardInferSchemaConnectorSmokeTest.testUpdate
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduWithDisabledInferSchemaConnectorSmokeTest.testUpdate
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduWithEmptyInferSchemaConnectorSmokeTest.testUpdate
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduWithStandardInferSchemaConnectorSmokeTest.testUpdate
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testDelete
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testDeleteWithLike
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateAllValues
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateWithPredicates
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateRowConcurrently
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdate
./26_test (plugintrino-kudu).txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateRowConcurrently

./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduLatestWithEmptyInferSchemaConnectorSmokeTest.testDeleteAllDataFromTable
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduLatestWithDisabledInferSchemaConnectorSmokeTest.testDeleteAllDataFromTable
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduLatestWithStandardInferSchemaConnectorSmokeTest.testDeleteAllDataFromTable
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduWithDisabledInferSchemaConnectorSmokeTest.testDeleteAllDataFromTable
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduWithEmptyInferSchemaConnectorSmokeTest.testDeleteAllDataFromTable
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduWithStandardInferSchemaConnectorSmokeTest.testDeleteAllDataFromTable
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduLatestWithDisabledInferSchemaConnectorSmokeTest.testRowLevelDelete
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduLatestWithStandardInferSchemaConnectorSmokeTest.testRowLevelDelete
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduLatestWithEmptyInferSchemaConnectorSmokeTest.testRowLevelDelete
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduWithEmptyInferSchemaConnectorSmokeTest.testRowLevelDelete
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduWithDisabledInferSchemaConnectorSmokeTest.testRowLevelDelete
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduWithStandardInferSchemaConnectorSmokeTest.testRowLevelDelete
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduLatestWithEmptyInferSchemaConnectorSmokeTest.testUpdate
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduLatestWithDisabledInferSchemaConnectorSmokeTest.testUpdate
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduLatestWithStandardInferSchemaConnectorSmokeTest.testUpdate
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduWithDisabledInferSchemaConnectorSmokeTest.testUpdate
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduWithEmptyInferSchemaConnectorSmokeTest.testUpdate
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduWithStandardInferSchemaConnectorSmokeTest.testUpdate
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testDelete
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testDeleteWithLike
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateAllValues
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateWithPredicates
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateRowConcurrently
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-2	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdate
./test (plugintrino-kudu)/8_Maven Tests.txt	ERROR	pool-3-thread-1	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateRowConcurrently

djsstarburst avatar Sep 20 '22 22:09 djsstarburst

I pushed a commit that fixes (most of) those delta lake tests that failed because they tried to delete or update rows whose partition column value was null. The bug was that DeltaLakeMergeSink.FileDeletion was copying the list of partition values using ImmutableList, which does not allow null values. Fixed by copying as an ArrayList.

djsstarburst avatar Sep 21 '22 14:09 djsstarburst

I pushed a commit that fixes all the Kudu smoke test failures.

djsstarburst avatar Sep 21 '22 16:09 djsstarburst

A bunch of test failures have been fixed by the last three commits. Here are the remain test failures that are related to the changes in this PR:

Iceberg

All tests passing

Raptor

All tests passing

Hive

These failures I think are related to losing the filter in push-down of predicates:

io.trino.tests.product.hive.TestHiveRedirectionToIceberg.testUpdate - - Pushdown filter lost
io.trino.tests.product.hive.TestHiveTransactionalTable.testAcidUpdatePartitioned - - Pushdown filter lost
io.trino.tests.product.hive.TestHiveTransactionalTable.testPartitionedInsertAndRowLevelDelete - - Pushdown filter lost
io.trino.tests.product.hive.TestHiveTransactionalTable.testTransactionalMetadataDelete -- Metadata delete failed

These failures look serious, because they might indicate a redistribution pattern that violates ORC rules:

io.trino.tests.product.hive.TestHiveTransactionalTable.testUnbucketedPartitionedTransactionalTableWithTaskWriterCountGreaterThanOne
AlreadyBeingCreatedException: Failed to CREATE_FILE /user/hive/warehouse/test_test_unbucketed_partitioned_transactional_table_with_task_writer_count_greater_than_one_true_none_10dzu0qd5zgv/orderpriority=1-URGENT/delete_delta_0000002_0000002_0000/bucket_00000 for DFSClient_NONMAPREDUCE_-1152645189_44 on 172.18.0.4 because DFSClient_NONMAPREDUCE_-1152645189_44 is already the current lease holder.

io.trino.tests.product.hive.TestHiveTransactionalTable.testUpdateFullAcidWithOriginalFilesTrinoInsertingAndDeleting
AlreadyBeingCreatedException): Failed to CREATE_FILE /user/hive/warehouse/test_trino_update_full_acid_acid_converted_table_read_false_bucketed_default_rhwmijhyrkk4/delete_delta_10000005_10000005_0000/bucket_00000 for DFSClient_NONMAPREDUCE_-693512347_47 on 172.21.0.3 because this file lease is currently owned by DFSClient_NONMAPREDUCE_-805639917_44 on 172.21.0.4

DeltaLake

Most of the Delta Lake failing tests are failing because they check for specific files created or deleted. I'm unsure if that's valid given how different the delete and update mechanisms are now. It may be that the tests should pass but aren't because predicate pushed or metadata delete are failing.

Here are the test failures in that category:

io.trino.plugin.deltalake.TestDeltaLakeConnectorSmokeTest.testVacuum
io.trino.plugin.deltalake.TestDeltaLakeDatabricksConnectorTest.testAddColumnAndVacuum
io.trino.plugin.deltalake.TestDeltaLakeDelete.testDeleteAllDatabricks
io.trino.plugin.deltalake.TestDeltaLakeDelete.testDeleteAllOssDeltaLake
io.trino.plugin.deltalake.TestDeltaLakeLegacyWriterConnectorSmokeTest.testVacuum
io.trino.plugin.deltalake.TestDeltaLakeOssDeltaLakeConnectorTest.testAddColumnAndVacuum

This error looks serious:

io.trino.plugin.deltalake.TestDeltaLakeDelete.testTargetedDeleteWhenTableIsPartitionedWithColumnContainingSpecialCharacters
io.trino.spi.TrinoException: Unable to rewrite Parquet file
at io.trino.plugin.deltalake.DeltaLakeMergeSink.rewriteFile(DeltaLakeMergeSink.java:199)
at io.trino.plugin.deltalake.DeltaLakeMergeSink.lambda$finish$4(DeltaLakeMergeSink.java:167)
at java.base/java.util.HashMap.forEach(HashMap.java:1421)

These are tests is which predicate pushdown is failing:

io.trino.plugin.deltalake.TestDeltaLakeUpdate.testUpdateWithPartitionKeyPredicate
io.trino.plugin.deltalake.TestPredicatePushdown.testDeletePushdown
io.trino.plugin.deltalake.TestPredicatePushdown.testUpdatePushdown

Kudu

The only Kudu tests that are failing are below. They are currently undiagnosed:

io.trino.plugin.kudu.TestKuduConnectorTest.testUpdate
io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateRowConcurrently
io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateWithPredicates

djsstarburst avatar Sep 28 '22 15:09 djsstarburst

The latest complete list of failures, including duplicates:

./11_pt (default, suite-8-non-generic, ).txt     INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testAcidUpdatePartitioned
./11_pt (default, suite-8-non-generic, ).txt     INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testCorrectSelectCountStar
./11_pt (default, suite-8-non-generic, ).txt     INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testPartitionedInsertAndRowLevelDelete
./11_pt (default, suite-8-non-generic, ).txt     INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testTransactionalMetadataDelete

./pt (default, suite-8-non-generic, )/8_Product Tests.txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testAcidUpdatePartitioned
./pt (default, suite-8-non-generic, )/8_Product Tests.txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testCorrectSelectCountStar
./pt (default, suite-8-non-generic, )/8_Product Tests.txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testPartitionedInsertAndRowLevelDelete
./pt (default, suite-8-non-generic, )/8_Product Tests.txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testTransactionalMetadataDelete

./8_pt (hdp3, suite-5, ).txt:22780    INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testUpdateFullAcidWithOriginalFilesTrinoInsertingAndDeleting

./5_pt (hdp3, suite-1, ).txt:10988    INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testAcidUpdatePartitioned
./5_pt (hdp3, suite-1, ).txt:11325    INFO: FAILURE     /     io.trino.tests.product.hive.TestHiveTransactionalTable.testPartitionedInsertAndRowLevelDelete
./5_pt (hdp3, suite-1, ).txt:12712    INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testTransactionalMetadataDelete
./5_pt (hdp3, suite-1, ).txt:13343    INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testUnbucketedPartitionedTransactionalTableWithTaskWriterCountGreaterThanOne

./pt (default, suite-7-non-generic, )/8_Product Tests.txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveRedirectionToIceberg.testUpdate

./pt (hdp3, suite-1, )/8_Product Tests.txt   INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testAcidUpdatePartitioned
./pt (hdp3, suite-1, )/8_Product Tests.txt   INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testCorrectSelectCountStar
./pt (hdp3, suite-1, )/8_Product Tests.txt   INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testPartitionedInsertAndRowLevelDelete
./pt (hdp3, suite-1, )/8_Product Tests.txt   INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testTransactionalMetadataDelete
./pt (hdp3, suite-1, )/8_Product Tests.txt   INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testUnbucketedPartitionedTransactionalTableWithTaskWriterCountGreaterThanOne
./pt (hdp3, suite-5, )/8_Product Tests.txt   INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testUpdateFullAcidWithOriginalFilesTrinoInsertingAndDeleting

./10_pt (default, suite-7-non-generic, ).txt   INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveRedirectionToIceberg.testUpdate

./11_test (plugintrino-delta-lake).txt  ERROR	pool-3-thread-2		[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeConnectorSmokeTest.testVacuum
./11_test (plugintrino-delta-lake).txt  ERROR	pool-3-thread-2		[TEST FAILURE] io.trino.plugin.deltalake.TestPredicatePushdown.testDeletePushdown
./11_test (plugintrino-delta-lake).txt  ERROR	pool-3-thread-2		[TEST FAILURE] io.trino.plugin.deltalake.TestPredicatePushdown.testUpdatePushdown
./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-1		[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeDatabricksConnectorTest.testAddColumnAndVacuum
./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-1		[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeUpdate.testUpdateWithPartitionKeyPredicate
./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-2		[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeOssDeltaLakeConnectorTest.testAddColumnAndVacuum
./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-1		[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeDelete.testDeleteAllDatabricks
./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-1		[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeDelete.testDeleteAllOssDeltaLake
./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-1		[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeDelete.testTargetedDeleteWhenTableIsPartitionedWithColumnContainingSpecialCharacters
./11_test (plugintrino-delta-lake).txt	ERROR	pool-3-thread-2		[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeLegacyWriterConnectorSmokeTest.testVacuum

./test (plugintrino-delta-lake)/8_Maven Tests.txt	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeConnectorSmokeTest.testVacuum
./test (plugintrino-delta-lake)/8_Maven Tests.txt	[TEST FAILURE] io.trino.plugin.deltalake.TestPredicatePushdown.testDeletePushdown
./test (plugintrino-delta-lake)/8_Maven Tests.txt	[TEST FAILURE] io.trino.plugin.deltalake.TestPredicatePushdown.testUpdatePushdown
./test (plugintrino-delta-lake)/8_Maven Tests.txt	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeDatabricksConnectorTest.testAddColumnAndVacuum
./test (plugintrino-delta-lake)/8_Maven Tests.txt	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeUpdate.testUpdateWithPartitionKeyPredicate
./test (plugintrino-delta-lake)/8_Maven Tests.txt	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeOssDeltaLakeConnectorTest.testAddColumnAndVacuum
./test (plugintrino-delta-lake)/8_Maven Tests.txt	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeDelete.testDeleteAllDatabricks
./test (plugintrino-delta-lake)/8_Maven Tests.txt	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeDelete.testDeleteAllOssDeltaLake
./test (plugintrino-delta-lake)/8_Maven Tests.txt	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeDelete.testTargetedDeleteWhenTableIsPartitionedWithColumnContainingSpecialCharacters
./test (plugintrino-delta-lake)/8_Maven Tests.txt	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeLegacyWriterConnectorSmokeTest.testVacuum

./26_test (plugintrino-kudu).txt	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdate
./26_test (plugintrino-kudu).txt	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateRowConcurrently

./test (plugintrino-kudu)/8_Maven Tests.txt	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdate
./test (plugintrino-kudu)/8_Maven Tests.txt	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateRowConcurrently

The latest complete list of failures, excluding duplicates:

io.trino.tests.product.hive.TestHiveRedirectionToIceberg.testUpdate

io.trino.tests.product.hive.TestHiveTransactionalTable.testAcidUpdatePartitioned
io.trino.tests.product.hive.TestHiveTransactionalTable.testPartitionedInsertAndRowLevelDelete
io.trino.tests.product.hive.TestHiveTransactionalTable.testTransactionalMetadataDelete
io.trino.tests.product.hive.TestHiveTransactionalTable.testUnbucketedPartitionedTransactionalTableWithTaskWriterCountGreaterThanOne
io.trino.tests.product.hive.TestHiveTransactionalTable.testUpdateFullAcidWithOriginalFilesTrinoInsertingAndDeleting

io.trino.plugin.deltalake.TestDeltaLakeConnectorSmokeTest.testVacuum
io.trino.plugin.deltalake.TestDeltaLakeDatabricksConnectorTest.testAddColumnAndVacuum
io.trino.plugin.deltalake.TestDeltaLakeDelete.testDeleteAllDatabricks
io.trino.plugin.deltalake.TestDeltaLakeDelete.testDeleteAllOssDeltaLake
io.trino.plugin.deltalake.TestDeltaLakeDelete.testTargetedDeleteWhenTableIsPartitionedWithColumnContainingSpecialCharacters
io.trino.plugin.deltalake.TestDeltaLakeLegacyWriterConnectorSmokeTest.testVacuum
io.trino.plugin.deltalake.TestDeltaLakeOssDeltaLakeConnectorTest.testAddColumnAndVacuum
io.trino.plugin.deltalake.TestDeltaLakeUpdate.testUpdateWithPartitionKeyPredicate
io.trino.plugin.deltalake.TestPredicatePushdown.testDeletePushdown
io.trino.plugin.deltalake.TestPredicatePushdown.testUpdatePushdown

io.trino.plugin.kudu.TestKuduConnectorTest.testUpdate
io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateRowConcurrently

djsstarburst avatar Sep 28 '22 22:09 djsstarburst

Thanks for the great summary and continuing to trudge through this.

io.trino.plugin.deltalake.TestDeltaLakeConnectorSmokeTest.testVacuum

This seems to be the failing assertion:

assertThat(getAllDataFilesFromTableDirectory(tableName)).isEqualTo(union(initialFiles, updatedFiles));

When I manually sorted and inspected the sets, it appears that the actual data files set contains an additional data file for each partition (i.e. the regionkey=X part of the path) that is not part of the active files set.

Fortunately, this test runs in the IDE, so you can debug it. I'd start by adding logging in DeltaLakeMergeSink for the names of the insertion and rewritten files, and in DeltaLakeMetadata.finishMerge() where it accepts the set of new files and updates the transaction log.

electrum avatar Sep 28 '22 23:09 electrum

Delta Lake tests that test file counts after update are failing because DeltaLakeMergeSink.storeMergedRows is using MergePage.createDeleteAndInsertPages to break the changes up into insert and delete pages, and subsequently treating each separately, writing each to a different file.

This is wrong for Delta Lake, which is supposed to work by rewriting files. Instead, DeltaLakeMergeSink.storeMergedRows should be creating a single file, as is done in DeltaLakeUpdatablePageSource.

(Parenthetically, I guess Delta Lake MERGE is also incorrectly generating 2 files for each partition for UPDATE cases.)

It will take some thought to fix storeMergedRows. We can't do exactly what DeltaLakeUpdatablePageSource is doing because a single MERGE can contain both DELETE and UPDATE operations.

On reflection, it seems that the RowChangeParadigm for Delta Lake should be CHANGE_ONLY_UPDATED_COLUMNS rather than DELETE_ROW_AND_INSERT_ROW. There doesn't seem to be any value add in breaking updates into deletes and inserts for Delta Lake.

[Later]

I discussed the matter with @electrum, and he concluded that assuming we aren't violating the Delta Lake spec, the code is functioning as it should and the tests should be changed. One curiosity that must be resolved is that when for example testVacuum fetches the "active files", it only sees 5 of the 10 new files.

djsstarburst avatar Sep 30 '22 13:09 djsstarburst

@alexjo2144 showed me how to decode Delta Lake files, and I quickly found the bug with tests like testVacuum: storing data files with zero rows. I added a commit to ensure that DeltaLakeMergeSink.rewriteParquetFile calls fileWriter.rollback() rather than fileWriter.commit() if there are no rows in the file. This commit fixed many of the failing Delta Lake tests.

Most of the remaining test failures, in Delta Lake and Hive, are due to losing the non-partition part of a filter when the partition part is pushed into the table scan.

This is the complete list of failing tests, including duplicates:

./11_pt (default, suite-8-non-generic, ).txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testAcidUpdatePartitioned
./11_pt (default, suite-8-non-generic, ).txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testCorrectSelectCountStar
./11_pt (default, suite-8-non-generic, ).txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testPartitionedInsertAndRowLevelDelete
./11_pt (default, suite-8-non-generic, ).txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testTransactionalMetadataDelete

./pt (default, suite-8-non-generic, )/8_Product Tests.txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testAcidUpdatePartitioned
./pt (default, suite-8-non-generic, )/8_Product Tests.txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testCorrectSelectCountStar
./pt (default, suite-8-non-generic, )/8_Product Tests.txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testPartitionedInsertAndRowLevelDelete
./pt (default, suite-8-non-generic, )/8_Product Tests.txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testTransactionalMetadataDelete

./8_pt (hdp3, suite-5, ).txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testUpdateFullAcidWithOriginalFilesTrinoInsertingAndDeleting
./5_pt (hdp3, suite-1, ).txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testAcidUpdatePartitioned
./5_pt (hdp3, suite-1, ).txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testCorrectSelectCountStar
./5_pt (hdp3, suite-1, ).txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testPartitionedInsertAndRowLevelDelete
./5_pt (hdp3, suite-1, ).txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testTransactionalMetadataDelete
./5_pt (hdp3, suite-1, ).txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testUnbucketedPartitionedTransactionalTableWithTaskWriterCountGreaterThanOne

./pt (default, suite-7-non-generic, )/8_Product Tests.txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveRedirectionToIceberg.testUpdate

./pt (hdp3, suite-1, )/8_Product Tests.txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testAcidUpdatePartitioned
./pt (hdp3, suite-1, )/8_Product Tests.txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testCorrectSelectCountStar
./pt (hdp3, suite-1, )/8_Product Tests.txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testPartitionedInsertAndRowLevelDelete
./pt (hdp3, suite-1, )/8_Product Tests.txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testTransactionalMetadataDelete
./pt (hdp3, suite-1, )/8_Product Tests.txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testUnbucketedPartitionedTransactionalTableWithTaskWriterCountGreaterThanOne
./pt (hdp3, suite-5, )/8_Product Tests.txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testUpdateFullAcidWithOriginalFilesTrinoInsertingAndDeleting

./10_pt (default, suite-7-non-generic, ).txt INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveRedirectionToIceberg.testUpdate


./7_test (plugintrino-delta-lake).txt	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeUpdate.testUpdateWithPartitionKeyPredicate
./7_test (plugintrino-delta-lake).txt	[TEST FAILURE] io.trino.plugin.deltalake.TestPredicatePushdown.testDeletePushdown
./7_test (plugintrino-delta-lake).txt	[TEST FAILURE] io.trino.plugin.deltalake.TestPredicatePushdown.testUpdatePushdown

./test (plugintrino-delta-lake)/8_Maven Tests.txt	[TEST FAILURE] io.trino.plugin.deltalake.TestDeltaLakeUpdate.testUpdateWithPartitionKeyPredicate
./test (plugintrino-delta-lake)/8_Maven Tests.txt	[TEST FAILURE] io.trino.plugin.deltalake.TestPredicatePushdown.testDeletePushdown
./test (plugintrino-delta-lake)/8_Maven Tests.txt	[TEST FAILURE] io.trino.plugin.deltalake.TestPredicatePushdown.testUpdatePushdown

./test (plugintrino-kudu)/8_Maven Tests.txt	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateRowConcurrently

./17_test (plugintrino-kudu).txt	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateRowConcurrently

This is the complete list of failing tests excluding duplicates:

io.trino.plugin.deltalake.TestDeltaLakeUpdate.testUpdateWithPartitionKeyPredicate
io.trino.plugin.deltalake.TestPredicatePushdown.testDeletePushdown
io.trino.plugin.deltalake.TestPredicatePushdown.testUpdatePushdown

io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateRowConcurrently

io.trino.tests.product.hive.TestHiveRedirectionToIceberg.testUpdate
io.trino.tests.product.hive.TestHiveTransactionalTable.testAcidUpdatePartitioned
io.trino.tests.product.hive.TestHiveTransactionalTable.testCorrectSelectCountStar
io.trino.tests.product.hive.TestHiveTransactionalTable.testPartitionedInsertAndRowLevelDelete
io.trino.tests.product.hive.TestHiveTransactionalTable.testTransactionalMetadataDelete
io.trino.tests.product.hive.TestHiveTransactionalTable.testUnbucketedPartitionedTransactionalTableWithTaskWriterCountGreaterThanOne
io.trino.tests.product.hive.TestHiveTransactionalTable.testUpdateFullAcidWithOriginalFilesTrinoInsertingAndDeleting

djsstarburst avatar Oct 05 '22 03:10 djsstarburst

The most recent commit - - ix predicate pushdown in merge - - fixes all of the pushdown problems. However, it breaks all the Kudu merge tests because the KuduRecordSet produced has -1 for the column index of the rowId column. Before this change, the KuduTableHandle used to construct didn't have a rowId, but it was synthesized by KuduUpdatablePageSource. I'll need some time to figure how to fix Kudu.

But the good news is, excluding Kudu, the list of failing tests, excluding duplicates, is getting very short:

io.trino.faulttolerant.iceberg.TestIcebergTaskFailureRecoveryTest.testMergePartitionedTable

io.trino.tests.product.hive.TestHiveTransactionalTable.testCorrectSelectCountStar
io.trino.tests.product.hive.TestHiveTransactionalTable.testPartitionedInsertAndRowLevelDelete
io.trino.tests.product.hive.TestHiveTransactionalTable.testTransactionalMetadataDelete
io.trino.tests.product.hive.TestHiveTransactionalTable.testUnbucketedPartitionedTransactionalTableWithTaskWriterCountGreaterThanOne
io.trino.tests.product.hive.TestHiveTransactionalTable.testUpdateFullAcidWithOriginalFilesTrinoInsertingAndDeleting

djsstarburst avatar Oct 07 '22 18:10 djsstarburst

The TestHiveTransactionalTable failures seem to be caused by trying to write the same file name for the same bucket on multiple workers. To fix this, we need to implement getUpdateLayout() in HiveMetadata. We'll need a new HiveUpdateHandle that implements ConnectorPartitioningHandle, which is then used in HiveNodePartitioningProvider. The new HiveUpdateBucketFunction will need to distribute based on the ACID_COLUMN_BUCKET field in the row ID, similar to IcebergUpdateBucketFunction.

Edit: I'm not sure why MERGE works without this. We should be writing unique file names on each worker, otherwise we'd see the same problem with MERGE. Implementing this is the right thing to do for performance reasons, but I'm curious why this error occurs.

electrum avatar Oct 08 '22 02:10 electrum

The latest commit fixes the regression of Kudu tests. Here is the complete list of current test failures, not including duplicates. The next steps are to fix the Hive row count problems, and to work on the Hive tests, as suggested above by @electrum.


These tests get the wrong row count when selecting after delete. In each case, after a delete operation, running the query on Trino gets the expected row count, but running the query on Hive gets the wrong row count:

  • io.trino.tests.product.hive.TestHiveTransactionalTable.testCorrectSelectCountStar - - Before deleting there are 6 rows. After Trino deletes a row, Trino gets the expected row count of 5, but Hive gets a row count of 7!
  • io.trino.tests.product.hive.TestHiveTransactionalTable.testPartitionedInsertAndRowLevelDelete [HIVE, TRINO] - - The only version of this test that fails is the one where Hive does the table inserts, and Trino does the deletes. For all other combinations of inserters and deleters, the test succeeds.
  • io.trino.tests.product.hive.TestHiveTransactionalTable.testTransactionalMetadataDelete [HIVE, TRINO] - - Like the previous test, the only version of this test that fails is the one where Hive does the table inserts, and Trino does the deletes. For all other combinations of inserters and deleters, the test succeeds.

These show the multiple-writers-for-one-file problem pointed out by @electrum:

io.trino.tests.product.hive.TestHiveTransactionalTable.testUnbucketedPartitionedTransactionalTableWithTaskWriterCountGreaterThanOne
io.trino.tests.product.hive.TestHiveTransactionalTable.testUpdateFullAcidWithOriginalFilesTrinoInsertingAndDeleting

This fault-tolerant test gets this exception before the injected exception is raised: "Insert and update layout have mismatched BucketNodeMap". We've seen this before, but we thought we had fixed it:

io.trino.faulttolerant.iceberg.TestIcebergTaskFailureRecoveryTest.testMergePartitionedTable

This test sometimes (though rarely) returns the wrong count of rows updated:

io.trino.plugin.kudu.TestKuduConnectorTest.testUpdate

This test fails consistently for reasons yet to be diagnosed:

io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateRowConcurrently

djsstarburst avatar Oct 08 '22 14:10 djsstarburst

@electrum and I fixed bugs in the commit adding Hive update layout support, and I fixed a bug in comparing partition bucket maps. After these changes, we have only 6 test failures when duplicates are eliminated, 3 for Hive and 3 for Kudu.

Test failures after eliminating duplicates

The Hive test failures all have the same cause - - if Hive inserts rows and then Trino deletes them, though Trino SELECTs get the right answer, Hive SELECTs do not. If Trino inserts and Trino deletes or if Trino inserts and Hive deletes both Hive and Trino SELECTs return the correct rows. This mysterious result is the next thing to debug:

io.trino.tests.product.hive.TestHiveTransactionalTable.testCorrectSelectCountStar [HIVE, TRINO]
io.trino.tests.product.hive.TestHiveTransactionalTable.testPartitionedInsertAndRowLevelDelete [HIVE, TRINO]
io.trino.tests.product.hive.TestHiveTransactionalTable.testTransactionalMetadataDelete [HIVE, TRINO]

The Kudu test failures show up as an incorrect row count from an UPDATE operation. Investigation is needed to determine whether the correct rows were updated:

io.trino.plugin.kudu.TestKuduConnectorTest.testUpdate
io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateAllValues
io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateRowConcurrently

All test failures

./11_pt (default, suite-8-non-generic, ).txt:5188:2022-10-15T14:35:25.6403102Z tests               | 2022-10-15 20:20:25 INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testCorrectSelectCountStar [HIVE, TRINO] (Groups: hive_transactional) took 14.7 seconds
./11_pt (default, suite-8-non-generic, ).txt:6000:2022-10-15T14:44:47.5385193Z tests               | 2022-10-15 20:29:47 INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testPartitionedInsertAndRowLevelDelete [HIVE, TRINO] (Groups: hive_transactional) took 20.0 seconds
./11_pt (default, suite-8-non-generic, ).txt:6586:2022-10-15T15:00:15.7478988Z tests               | 2022-10-15 20:45:15 INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testTransactionalMetadataDelete [HIVE, TRINO] (Groups: hive_transactional) took 13.7 seconds
./pt (default, suite-8-non-generic, )/7_Product Tests.txt:4153:2022-10-15T14:35:25.6403083Z tests               | 2022-10-15 20:20:25 INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testCorrectSelectCountStar [HIVE, TRINO] (Groups: hive_transactional) took 14.7 seconds
./pt (default, suite-8-non-generic, )/7_Product Tests.txt:4965:2022-10-15T14:44:47.5385186Z tests               | 2022-10-15 20:29:47 INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testPartitionedInsertAndRowLevelDelete [HIVE, TRINO] (Groups: hive_transactional) took 20.0 seconds
./pt (default, suite-8-non-generic, )/7_Product Tests.txt:5551:2022-10-15T15:00:15.7478968Z tests               | 2022-10-15 20:45:15 INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testTransactionalMetadataDelete [HIVE, TRINO] (Groups: hive_transactional) took 13.7 seconds
./5_pt (hdp3, suite-1, ).txt:11050:2022-10-15T14:58:27.9357103Z tests               | 2022-10-15 20:43:27 INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testCorrectSelectCountStar [HIVE, TRINO] (Groups: hive_transactional) took 11.7 seconds
./5_pt (hdp3, suite-1, ).txt:11926:2022-10-15T15:07:57.4769582Z tests               | 2022-10-15 20:52:57 INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testPartitionedInsertAndRowLevelDelete [HIVE, TRINO] (Groups: hive_transactional) took 16.9 seconds
./5_pt (hdp3, suite-1, ).txt:12398:2022-10-15T15:19:15.9009378Z tests               | 2022-10-15 21:04:15 INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testTransactionalMetadataDelete [HIVE, TRINO] (Groups: hive_transactional) took 12.1 seconds
./pt (hdp3, suite-1, )/7_Product Tests.txt:10020:2022-10-15T14:58:27.9357083Z tests               | 2022-10-15 20:43:27 INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testCorrectSelectCountStar [HIVE, TRINO] (Groups: hive_transactional) took 11.7 seconds
./pt (hdp3, suite-1, )/7_Product Tests.txt:10896:2022-10-15T15:07:57.4769558Z tests               | 2022-10-15 20:52:57 INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testPartitionedInsertAndRowLevelDelete [HIVE, TRINO] (Groups: hive_transactional) took 16.9 seconds
./pt (hdp3, suite-1, )/7_Product Tests.txt:11368:2022-10-15T15:19:15.9009346Z tests               | 2022-10-15 21:04:15 INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testTransactionalMetadataDelete [HIVE, TRINO] (Groups: hive_transactional) took 12.1 seconds
./test (plugintrino-kudu)/8_Maven Tests.txt:4700:2022-10-15T14:25:25.1232517Z 2022-10-15T09:25:25.101-0500	ERROR	pool-3-thread-2	io.trino.testng.services.ProgressLoggingListener	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdate; (took: 1.2 seconds)
./test (plugintrino-kudu)/8_Maven Tests.txt:4732:2022-10-15T14:25:25.3350766Z 2022-10-15T09:25:25.250-0500	ERROR	pool-3-thread-1	io.trino.testng.services.ProgressLoggingListener	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateAllValues; (took: 1.0 seconds)
./test (plugintrino-kudu)/8_Maven Tests.txt:4773:2022-10-15T14:25:26.3559803Z 2022-10-15T09:25:26.346-0500	ERROR	pool-3-thread-2	io.trino.testng.services.ProgressLoggingListener	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateRowConcurrently; (took: 1.2 seconds)
./17_test (plugintrino-kudu).txt:5719:2022-10-15T14:25:25.1232523Z 2022-10-15T09:25:25.101-0500	ERROR	pool-3-thread-2	io.trino.testng.services.ProgressLoggingListener	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdate; (took: 1.2 seconds)
./17_test (plugintrino-kudu).txt:5751:2022-10-15T14:25:25.3350991Z 2022-10-15T09:25:25.250-0500	ERROR	pool-3-thread-1	io.trino.testng.services.ProgressLoggingListener	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateAllValues; (took: 1.0 seconds)
./17_test (plugintrino-kudu).txt:5792:2022-10-15T14:25:26.3559821Z 2022-10-15T09:25:26.346-0500	ERROR	pool-3-thread-2	io.trino.testng.services.ProgressLoggingListener	[TEST FAILURE] io.trino.plugin.kudu.TestKuduConnectorTest.testUpdateRowConcurrently; (took: 1.2 seconds)

djsstarburst avatar Oct 15 '22 16:10 djsstarburst

Investigating the Hive delete test failures described above in which Hive SELECT after Hive INSERT and Trino DELETE gets the wrong answer, I captured the Orc files for the successful test in which Trino insert and Trino deletes, and compared to them to the Orc files for the failing case Hive inserting and Trino deleting. The decoded data files are identical. The main difference I see in the metadata is that in all cases, when Hive inserts, it marks the columns with hasNull: false, whereas in all cases the Trino insert marks the columns with hasNull: true.

This gzip archive contains the ORC files and decoded data and metadata for failing test TestHiveTransactionalTable.testCorrectSelectCountStar for Hive inserting/Trino deleting; Trino inserting/Trino deleting and Hive inserting/Hive deleting.

djsstarburst avatar Oct 15 '22 17:10 djsstarburst

The SQL MERGE fixes PR has been merged, so I rebuilt this PR on master, which removed a dozen commits since they were already in master. Hopefully the process did not introduce regressions.

Yesterday @electrum and I spent quite a while investigating the 3 remaining Hive tests that fail. I thought that perhaps delete was deleting the wrong rows, but it turns out that the problem is that the count of rows in the file statistics is incorrect when delete (and maybe update) are performed by the merge machinery. Fixing that problem is the next task.

djsstarburst avatar Oct 18 '22 20:10 djsstarburst

I added a commit that changed Hive's MergeFileWriter to set the rowCount to insertRowCount - deleteRowCount, which is negative for operations that delete more rows than they insert. That fixed the failing Hive tests, and changes in PR #14650 fixed the two failing Kudu tests, TestKuduConnectorTest.testUpdate and TestKuduConnectorTest.testUpdateAllValues.

The only test that is still failing in this branch is TestKuduConnectorTest.testUpdateRowConcurrently. Woohoo!

djsstarburst avatar Oct 19 '22 19:10 djsstarburst

@electrum and I determined that TestKuduConnectorTest.testUpdateRowConcurrently, which I added in this series, can't ever succeed in Kudu, because the Kudu concurrency semantics don't support different agents updating different columns of the same row. I removed that test.

I squashed all the commits after the QueryPlanner commits into a single commit that updates both the connectors and the tests, and switches planning of DELETE and UPDATE to use the merge machinery.

TestKuduConnectorTest.testUpdate and TestKuduConnectorTest.testUpdateAllValues both ran successfully in the last build, and I was hoping they were really fixed. Sadly, they failed in the latest build. That needs to be debugged.

djsstarburst avatar Oct 20 '22 00:10 djsstarburst

The reason the two Kudu update tests are failing is that a row gets sent to KuduPageSink.storeMergedRows twice. Moreover, the second copy of the row is the after the update of the first, so that row gets updated twice. The logs captured below tell the story. I was able to reproduce the failure using mvn surefire:test about once every 10 runs.

I wrote the equivalent test for Hive, and ran it 30 times with no failures.

The test program does this:

CREATE TABLE tablename (a INT, b INT, c INT) WITH (partition_by_hash_columns = ARRAY['a'], partition_by_hash_buckets = 2);
INSERT INTO tablename VALUES (1, 2, 3), (11, 12, 13), (21, 22, 23);
UPDATE tablename SET a = a + 1, b = b - 1, c = c * 2;

SELECT * FROM tablename; should be VALUES (2, 1, 6), (12, 11, 26), (22, 21, 46);
Correct results:

2022-10-30T11:33:55.421	storeMergedRows page Page[positions=1 0:Int[22], 1:Int[21], 2:Int[46], 3:Byte[3], 4:VarWidth["�"]]
2022-10-30T11:33:55.421	storeMergedRows page Page[positions=2 0:Int[2, 12], 1:Int[1, 11], 2:Int[6, 26], 3:Byte[3, 3], 4:VarWidth["�", "�     "]]

Incorrect results:

2022-10-30T11:34:51.467	storeMergedRows page Page[positions=1 0:Int[22], 1:Int[21], 2:Int[46], 3:Byte[3], 4:VarWidth["�"]]
2022-10-30T11:34:51.497	storeMergedRows page Page[positions=3 0:Int[2, 12, 23], 1:Int[1, 11, 20], 2:Int[6, 26, 92], 3:Byte[3, 3, 3], 4:VarWidth["�", "�     ", "�"]]

When bad, get java.lang.AssertionError: update count expected [3] but found [4]

Good and bad first call:

2022-10-30T11:31:57.692	storeMergedRows page Page[positions=1
    0:Int[22],
    1:Int[21],
    2:Int[46],
    3:Byte[3],
    4:VarWidth["�"]
]

Good second call:

2022-10-30T11:33:55.421	storeMergedRows page Page[positions=2
    0:Int[2, 12],
    1:Int[1, 11],
    2:Int[6, 26],
    3:Byte[3, 3],
    4:VarWidth["�", "�     "]
]

Bad second call:

2022-10-30T11:34:51.497	storeMergedRows page Page[positions=3
    0:Int[2, 12, 23],
    1:Int[1, 11, 20],
    2:Int[6, 26, 92],
    3:Byte[3, 3, 3],
    4:VarWidth["�", "�     ", "�"]
]

I added more logging, and here are the correct and incorrect results:

Correct results:

io.trino.operator.ChangeOnlyUpdatedColumnsMergeProcessor	transformPage inputPage Page[positions=1 0:VarWidth["�"], 1:Row[0:Int[22], 1:Int[21], 2:Int[46], 3:Byte[1], 4:Byte[3], 5:Int[0]], 2:Int[0], 3:Byte[1]]
io.trino.operator.ChangeOnlyUpdatedColumnsMergeProcessor	transformPage defaultCaseCount == 0 result Page[positions=1 0:Int[22], 1:Int[21], 2:Int[46], 3:Byte[3], 4:VarWidth["�"], 5:Byte[0]]
io.trino.operator.ChangeOnlyUpdatedColumnsMergeProcessor	transformPage inputPage Page[positions=2 0:VarWidth["�", "�     "], 1:Row[0:Int[2, 12], 1:Int[1, 11], 2:Int[6, 26], 3:Byte[1, 1], 4:Byte[3, 3], 5:Int[0, 0]], 2:Int[0, 0], 3:Byte[1, 1]]
io.trino.operator.ChangeOnlyUpdatedColumnsMergeProcessor	transformPage defaultCaseCount == 0 result Page[positions=2 0:Int[2, 12], 1:Int[1, 11], 2:Int[6, 26], 3:Byte[3, 3], 4:VarWidth["�", "�     "], 5:RLE[2@Byte[0]]]
io.trino.plugin.kudu.KuduPageSink	storeMergedRows page Page[positions=1 0:Int[22], 1:Int[21], 2:Int[46], 3:Byte[3], 4:VarWidth["�"]]
io.trino.plugin.kudu.KuduPageSink	storeMergedRows page Page[positions=2 0:Int[2, 12], 1:Int[1, 11], 2:Int[6, 26], 3:Byte[3, 3], 4:VarWidth["�", "�     "]]
io.trino.plugin.kudu.KuduPageSink	Delete for position 0: (int32 a=21), row ()
io.trino.plugin.kudu.KuduPageSink	Delete for position 0: (int32 a=1), row ()
io.trino.plugin.kudu.KuduPageSink	Insert for position 0, row ()
io.trino.plugin.kudu.KuduPageSink	Delete for position 1: (int32 a=11), row ()
io.trino.plugin.kudu.KuduPageSink	Insert for position 1, row ()
io.trino.plugin.kudu.KuduPageSink	Insert for position 0, row ()

Incorrect results:

io.trino.operator.ChangeOnlyUpdatedColumnsMergeProcessor	transformPage inputPage Page[positions=1 0:VarWidth["�"], 1:Row[0:Int[22], 1:Int[21], 2:Int[46], 3:Byte[1], 4:Byte[3], 5:Int[0]], 2:Int[0], 3:Byte[1]]
io.trino.operator.ChangeOnlyUpdatedColumnsMergeProcessor	transformPage defaultCaseCount == 0 result Page[positions=1 0:Int[22], 1:Int[21], 2:Int[46], 3:Byte[3], 4:VarWidth["�"], 5:Byte[0]]
io.trino.plugin.kudu.KuduPageSink	storeMergedRows page Page[positions=1 0:Int[22], 1:Int[21], 2:Int[46], 3:Byte[3], 4:VarWidth["�"]]
io.trino.plugin.kudu.KuduPageSink	Delete for position 0: (int32 a=21), row ()
io.trino.plugin.kudu.KuduPageSink	Insert for position 0, row ()
io.trino.operator.ChangeOnlyUpdatedColumnsMergeProcessor	transformPage inputPage Page[positions=3 0:VarWidth["�", "�     ", "�"], 1:Row[0:Int[2, 12, 23], 1:Int[1, 11, 20], 2:Int[6, 26, 92], 3:Byte[1, 1, 1], 4:Byte[3, 3, 3], 5:Int[0, 0, 0]], 2:Int[0, 0, 0], 3:Byte[1, 1, 1]]
io.trino.operator.ChangeOnlyUpdatedColumnsMergeProcessor	transformPage defaultCaseCount == 0 result Page[positions=3 0:Int[2, 12, 23], 1:Int[1, 11, 20], 2:Int[6, 26, 92], 3:Byte[3, 3, 3], 4:VarWidth["�", "�     ", "�"], 5:RLE[3@Byte[0]]]
io.trino.plugin.kudu.KuduPageSink	storeMergedRows page Page[positions=3 0:Int[2, 12, 23], 1:Int[1, 11, 20], 2:Int[6, 26, 92], 3:Byte[3, 3, 3], 4:VarWidth["�", "�     ", "�"]]
io.trino.plugin.kudu.KuduPageSink	Delete for position 0: (int32 a=1), row ()
io.trino.plugin.kudu.KuduPageSink	Insert for position 0, row ()
io.trino.plugin.kudu.KuduPageSink	Delete for position 1: (int32 a=11), row ()
io.trino.plugin.kudu.KuduPageSink	Insert for position 1, row ()
io.trino.plugin.kudu.KuduPageSink	Delete for position 2: (int32 a=22), row ()
io.trino.plugin.kudu.KuduPageSink	Insert for position 2, row ()

djsstarburst avatar Oct 30 '22 21:10 djsstarburst

@electrum and I looked hard at the failing Kudu update tests and determined that they fail intermittently because Kudu read semantics are not strong enough to ensure that a row in a tablet was not updated by a different writer. I've disabled those two tests with comments explaining the problem.

djsstarburst avatar Nov 01 '22 21:11 djsstarburst

Several of the SQL MERGE tests run by the brand-new TestIcebergParquetFaultTolerantExecutionConnectorTest are failing with the familiar error Insert and update layout have mismatched BucketNodeMap, in MergePartitioningHandle.getFaultTolerantScheme. The failures are readily reproduceable in the IDE.

I've taken a close look at the failures of testMergeMultipleOperations, which is run for various writer counts and partition and bucket configurations of the target table.

The failures are all cases with the phrase WITH partitioning = ARRAY['column']. The cases that pass have the phrase WITH partitioning = ARRAY['bucket(column, n)']. The count of writers doesn't matter in the failures. Here is a screen shot of the insert and update layouts in the failing case of one writer and WITH partitioning = ARRAY['customer']:

Screen Shot 2022-11-04 at 7 27 16 AM

djsstarburst avatar Nov 04 '22 14:11 djsstarburst

The CI build shows a compilation error:

Error:  /home/runner/work/trino/trino/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java:[1202,42] cannot find symbol
Error:    symbol:   class ArrayList
Error:    location: class io.trino.plugin.iceberg.IcebergMetadata
Error:  -> [Help 1]

But IcebergMetadata doesn't even refer to ArrayList, and line 1202 doesn't contain any code, as verified by looking directing at the origin branch: https://github.com/djsstarburst/trino/blob/david.stryker/delete-and-update-on-merge/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

What is going on?

djsstarburst avatar Nov 05 '22 13:11 djsstarburst

@electrum looked at the TestIcebergParquetFaultTolerantExecutionConnectorTest failures, and determined that the proximate cause of the failure was the expectation that FaultTolerantPartitioningSchemeFactory.create will always return the same FaultTolerantPartitioningScheme for a supplied PartitioningHandle. This is accomplished for non-fault-tolerant code by caching the previously-returned result, as is done in NodePartitioningManager.getNodePartitioningMap.

So absent some larger-level restructuring, we need to add the same sort of caching in FaultTolerantPartitioningSchemeFactory.

djsstarburst avatar Nov 07 '22 21:11 djsstarburst

Update on the status of the PR:

  • @electrum determined that the root cause of the test failures in TestIcebergParquetFaultTolerantExecutionConnectorTest was the special handling of fault-tolerant operations in MergePartitioningHandle. Moreover, after he removed that special handling, all tests passed.
  • @martint made the point that for at least a while after this PR is merged, we will need the ability at Trino startup to switch back to the old implementation of delete and update. So @electrum added FeatureConfig.legacyUpdateDeleteImplementation, defaulting to false, to control whether the legacy implementation would be used.
  • Finally, @electrum eliminated for now the last two commits which removed the SPI and connector support for legacy delete and update.

djsstarburst avatar Nov 19 '22 15:11 djsstarburst