hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-24933: Replication fails for transactional tables having same na…

Open jhungund opened this issue 3 years ago • 7 comments

…me as dropped external tables.

Change-Id: I79ed804617dcdadb51f961a933f4023ac0b6f509

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

jhungund avatar Jul 13 '22 10:07 jhungund

recheck

sshiwalkar avatar Jul 19 '22 16:07 sshiwalkar

The test re-run succeeded.

Thanks, Janardhan

jhungund avatar Jul 20 '22 10:07 jhungund

Can you add an analysis? Like the reason for the bug and how you are fixing it?

From the changes, I can infer that the access is "deferred", moved from occurring semantic analysis "load table" time, to later when the task actually runs and does the copy.

But what necessitates needing this? Did some state change between the two points?

Is this the significant change?

table = ImportSemanticAnalyzer.tableIfExists(tblDesc, hive); if (table == null) { table = ImportSemanticAnalyzer.createNewTableMetadataObject(tblDesc, true); }

So at deferred time, the copy/move/etc is only done if the table exists?

Just wondering if you can pass runnable/callable/function from ImportSemanticAnaylzer, so to keep all that current logic in the same place and not moved to another file. Then the DeferredWorkHelperImpl just needs to execute it, and logic can be removed from there.

cmunkey avatar Jul 22 '22 18:07 cmunkey

Summary of the change:

While setting up the tasks during the repl-load phase of the replication, delay the access to the metadata from the task creation phase to the task execution.

Root Cause Analysis

Background: During the incremental load phase of replication, all event logs are processed sequentially. One or more tasks are additionally spawned/created during the processing of each event. All the spawned tasks are also, subsequently, executed sequentially.

Scenario of the issue: The issue is seen in the following scenario:

  1. An external table(Eg. T1) is already replicated to the target cluster from the source cluster during the earlier replication cycles.
  2. This external table is dropped.
  3. A new managed table with the same name (T1) is recreated.

Root cause:

  1. The above mentioned operations (table drop and recreation) are propagated to the target cluster via the event logs during the subsequent incremental phase of replication.
  2. We create tasks to drop the old external tables for drop table event.
  3. We also create new tasks to create and load the table for the new table.
  4. Additionally, some additional events are logged which create tasks to load the table.
  5. During the creation of these load-table tasks, we try to access the metadata corresponding to the new table from the metadata store. In normal scenario of a fresh table creation, the metadata store will not have data corresponding to the new table (yet to be created). However, in this scenario, the old table still exists and hence, we end up using the metadata corresponding to old table. We try to use this metadata to create the load tasks for the new table. During the execution of these load tasks, which subsequently execute after the drop and recreate tasks, we find that the metadata set in the task context is stale and is inconsistent with the newly created table. Hence, the error.

Fix: Do not access the metadata during the task creation for load table. Instead, access the metadata during the task execution. By that time, the metadata is updated to the latest state by the previously executed tasks.

jhungund avatar Jul 28 '22 10:07 jhungund

Hi Francis, Thank you for taking a look at this change. Below are my replies to your comments (starting with [JH]) below

Can you add an analysis? Like the reason for the bug and how you are fixing it?

[JH] I have added a detailed comment about the root cause analysis. Please take a look.

From the changes, I can infer that the access is "deferred", moved from occurring semantic analysis "load table" time, to later when the task actually runs and does the copy. [JH] Yes, this is right.

But what necessitates needing this? Did some state change between the two points? [JH] Please check the detailed comment added earlier.

Is this the significant change?

table = ImportSemanticAnalyzer.tableIfExists(tblDesc, hive); if (table == null) { table = ImportSemanticAnalyzer.createNewTableMetadataObject(tblDesc, true); }

[JH] Yes, the access and the subsequent check: if (AcidUtils.isTransactionalTable(table)) { returns different results at task creation time and task execution time.

So at deferred time, the copy/move/etc is only done if the table exists?

Just wondering if you can pass runnable/callable/function from ImportSemanticAnaylzer, so to keep all that current logic in the same place and not moved to another file. Then the DeferredWorkHelperImpl just needs to execute it, and logic can be removed from there.

[JH] The current logic of moving code in deferred implementation simplifies the callers which only need to set the deferred implementation class. This keeps the logic at the same place as the execution. Hence, it looks OK to me. Do let me know, if it looks ok. If not, I can try to change it as per your comment.

Thanks, Janardhan

jhungund avatar Jul 28 '22 10:07 jhungund

Yes, I did not know this was an exiting PR that was resurrected. Yes, I thought that keeping table logic where it was was more logical, DeferredWorkHelperImpl doesn't need to know about tables, and doesn't need to be changed in the future if we find more of these issues where we look at an object in different places and get different results.

Otherwise, seems fine.

cmunkey avatar Jul 28 '22 22:07 cmunkey

Yes, I did not know this was an exiting PR that was resurrected. Yes, I thought that keeping table logic where it was was more logical, DeferredWorkHelperImpl doesn't need to know about tables, and doesn't need to be changed in the future if we find more of these issues where we look at an object in different places and get different results.

Otherwise, seems fine.

Thanks Francis. I have tried to address your comment in the latest patchset. Please take a look.

Thanks, Janardhan

jhungund avatar Aug 02 '22 09:08 jhungund