hudi [HUDI-7762] Optimizing Hudi Table Check with Delta Lake by Refining Class Name Checks In Spark3.5

[HUDI-7762] Optimizing Hudi Table Check with Delta Lake by Refining Class Name Checks In Spark3.5

Open majian1998 opened this issue 1 year ago • 3 comments

In Hudi, the Spark3_5Adapter calls v2.v1Table which in turn invokes the logic within Delta. When executed on a Delta table, this may result in an error. Therefore, the logic to determine whether it is a Hudi operation has been altered to class name checks to prevent errors during Delta Lake executions. When executing the delta test of spark3.5, the error is reported as follows: [DELTA_INVALID_V1_TABLE_CALL] v1Table call is not expected with path based DeltaTableV2 org.apache.spark.sql.delta.DeltaIllegalStateException: [DELTA_INVALID_V1_TABLE_CALL] v1Table call is not expected with path based DeltaTableV2 at org.apache.spark.sql.delta.DeltaErrorsBase.invalidV1TableCall(DeltaErrors.scala:1801) at org.apache.spark.sql.delta.DeltaErrorsBase.invalidV1TableCall$(DeltaErrors.scala:1800) at org.apache.spark.sql.delta.DeltaErrors$.invalidV1TableCall(DeltaErrors.scala:3203) at org.apache.spark.sql.delta.catalog.DeltaTableV2.$anonfun$v1Table$1(DeltaTableV2.scala:320) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.delta.catalog.DeltaTableV2.v1Table(DeltaTableV2.scala:320) at org.apache.spark.sql.adapter.Spark3_5Adapter.$anonfun$resolveHoodieTable$1(Spark3_5Adapter.scala:57) at scala.Option.orElse(Option.scala:447) at org.apache.spark.sql.adapter.Spark3_5Adapter.resolveHoodieTable(Spark3_5Adapter.scala:52) at org.apache.spark.sql.hudi.analysis.HoodieAnalysis$ResolvesToHudiTable$.unapply(HoodieAnalysis.scala:362)

Change Logs

none

Impact

none

Risk level (write none, low medium or high below)

none

Documentation Update

None

Contributor's checklist

[ ] Read through contributor's guide
[ ] Change Logs and Impact were stated clearly
[ ] Adequate tests were added if applicable
[ ] CI passed

May 15 '24 09:05 majian1998

When executed on a Delta table, this may result in an error.

What action are we executing here?

May 16 '24 00:05 danny0405

When executed on a Delta table, this may result in an error.

What action are we executing here?

like INSERT OVERWRITE delta./tmp/delta-table SELECT col1 as id FROM VALUES 5,6,7,8,9; in https://docs.delta.io/latest/quick-start.html

we internally use hoodiecatalog to handle delta table and other types of table. but hoodie(hudi-spark-datasource/hudi-spark3.5.x/src/main/scala/org/apache/spark/sql/adapter/Spark3_5Adapter.scala) will call v1Table when the table is delta and delta will throw exception, which should not be called when it is not a hudi table.

May 22 '24 10:05 leesf

CI report:

0c01b0781e8c49da0f07a2379050c2be204cf373 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

May 28 '24 23:05 hudi-bot

hudi hudi copied to clipboard

[HUDI-7762] Optimizing Hudi Table Check with Delta Lake by Refining Class Name Checks In Spark3.5

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

CI report:

hudi
hudi copied to clipboard