hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-7762] Optimizing Hudi Table Check with Delta Lake by Refining Class Name Checks In Spark3.5

Open majian1998 opened this issue 1 year ago • 3 comments

In Hudi, the Spark3_5Adapter calls v2.v1Table which in turn invokes the logic within Delta. When executed on a Delta table, this may result in an error. Therefore, the logic to determine whether it is a Hudi operation has been altered to class name checks to prevent errors during Delta Lake executions. When executing the delta test of spark3.5, the error is reported as follows: [DELTA_INVALID_V1_TABLE_CALL] v1Table call is not expected with path based DeltaTableV2 org.apache.spark.sql.delta.DeltaIllegalStateException: [DELTA_INVALID_V1_TABLE_CALL] v1Table call is not expected with path based DeltaTableV2 at org.apache.spark.sql.delta.DeltaErrorsBase.invalidV1TableCall(DeltaErrors.scala:1801) at org.apache.spark.sql.delta.DeltaErrorsBase.invalidV1TableCall$(DeltaErrors.scala:1800) at org.apache.spark.sql.delta.DeltaErrors$.invalidV1TableCall(DeltaErrors.scala:3203) at org.apache.spark.sql.delta.catalog.DeltaTableV2.$anonfun$v1Table$1(DeltaTableV2.scala:320) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.delta.catalog.DeltaTableV2.v1Table(DeltaTableV2.scala:320) at org.apache.spark.sql.adapter.Spark3_5Adapter.$anonfun$resolveHoodieTable$1(Spark3_5Adapter.scala:57) at scala.Option.orElse(Option.scala:447) at org.apache.spark.sql.adapter.Spark3_5Adapter.resolveHoodieTable(Spark3_5Adapter.scala:52) at org.apache.spark.sql.hudi.analysis.HoodieAnalysis$ResolvesToHudiTable$.unapply(HoodieAnalysis.scala:362)

Change Logs

none

Impact

none

Risk level (write none, low medium or high below)

none

Documentation Update

None

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

majian1998 avatar May 15 '24 09:05 majian1998

When executed on a Delta table, this may result in an error.

What action are we executing here?

danny0405 avatar May 16 '24 00:05 danny0405

When executed on a Delta table, this may result in an error.

What action are we executing here?

like INSERT OVERWRITE delta./tmp/delta-table SELECT col1 as id FROM VALUES 5,6,7,8,9; in https://docs.delta.io/latest/quick-start.html

we internally use hoodiecatalog to handle delta table and other types of table. but hoodie(hudi-spark-datasource/hudi-spark3.5.x/src/main/scala/org/apache/spark/sql/adapter/Spark3_5Adapter.scala) will call v1Table when the table is delta and delta will throw exception, which should not be called when it is not a hudi table.

leesf avatar May 22 '24 10:05 leesf

CI report:

  • 0c01b0781e8c49da0f07a2379050c2be204cf373 Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar May 28 '24 23:05 hudi-bot