hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT] Hudi should rollback commit when metasync fails

Open parisni opened this issue 1 year ago • 6 comments

hudi 0.15.0

AFAIK currently when metasync (hive/glue...) fails, hudi commits the data:

  1. hudi commit
  2. metasync action
  3. return failure status
  4. eg of insert into cmd

So right now the user is aware there is a failure. However the hudi table is commited.

proposal

I propose just after a metasync failure, a rollback operation on the current instant is done.

Scenario

Let's say we promoted a type (int -> string). However athena does not support it. The glue metasync should raise an error.

  1. If the commit is rollback, then the metastore can stay in a corrupted state (table partially updated) in worst case. => The table can still be read from athena
  2. However if it is not rollback, both metastore and hudi table will be in corrupted state: hudi will be promoted as string, but metastore is not able to support it. Then user has to rollback manually. -> the table cannot be read anymore from athena

parisni avatar Jul 29 '24 16:07 parisni

I propose just after a metasync failure, a rollback operation on the current instant is done.

Hudi is designed in a way that transaction and meta sync are decoupled. User can read a table through the base path with or without meta / catalog sync.

Let's say we promoted a type (int -> string). However athena does not support it. The glue metasync should raise an error.

Hudi Glue sync supports recreating the table in the catalog in case the schema change is not allowed in the existing table metadata in the catalog. Will that solve your problem?

yihua avatar Aug 09 '24 19:08 yihua

See #11451.

yihua avatar Aug 09 '24 19:08 yihua

Recreating the glue metadata won't solve the issue. Athena will fail to merge parquet schemas since it just does not support such promotion. If we rollback the glue schema only then Athena will try to read string as int.

parisni avatar Aug 10 '24 14:08 parisni

Recreating the glue metadata won't solve the issue. Athena will fail to merge parquet schemas since it just does not support such promotion. If we rollback the glue schema only then Athena will try to read string as int.

Got it. Spark reader works in the case of type promotion. Is this specific to Athena's support on type promotion in Hudi? It would be good to raise a ticket to Athena. cc @CTTY

yihua avatar Aug 14 '24 17:08 yihua

Yes, I think this can be fixed in two ways: [1] Like mentioned in this thread, we add auto rollback when metasync fails in Hudi [2] Raise a ticket to Athena and have Athena support such type promotion (Please reach out to your AWS support contact for this)

These 2 solutions won't block each other and we can do both. [1] would be better if we hide the auto rollback feature behind a Hudi config

CTTY avatar Aug 14 '24 17:08 CTTY

I did request for feature to both Athena and redshift spectrum few weeks ago. BTW I will make the feature configurable as suggested.

parisni avatar Aug 15 '24 14:08 parisni