hudi
hudi copied to clipboard
[HUDI-7762] Optimizing Hudi Table Check with Delta Lake by Refining Class Name Checks In Spark3.5
In Hudi, the Spark3_5Adapter calls v2.v1Table which in turn invokes the logic within Delta. When executed on a Delta table, this may result in an error. Therefore, the logic to determine whether it is a Hudi operation has been altered to class name checks to prevent errors during Delta Lake executions.
When executing the delta test of spark3.5, the error is reported as follows:
[DELTA_INVALID_V1_TABLE_CALL] v1Table call is not expected with path based DeltaTableV2 org.apache.spark.sql.delta.DeltaIllegalStateException: [DELTA_INVALID_V1_TABLE_CALL] v1Table call is not expected with path based DeltaTableV2 at org.apache.spark.sql.delta.DeltaErrorsBase.invalidV1TableCall(DeltaErrors.scala:1801) at org.apache.spark.sql.delta.DeltaErrorsBase.invalidV1TableCall$(DeltaErrors.scala:1800) at org.apache.spark.sql.delta.DeltaErrors$.invalidV1TableCall(DeltaErrors.scala:3203) at org.apache.spark.sql.delta.catalog.DeltaTableV2.$anonfun$v1Table$1(DeltaTableV2.scala:320) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.delta.catalog.DeltaTableV2.v1Table(DeltaTableV2.scala:320) at org.apache.spark.sql.adapter.Spark3_5Adapter.$anonfun$resolveHoodieTable$1(Spark3_5Adapter.scala:57) at scala.Option.orElse(Option.scala:447) at org.apache.spark.sql.adapter.Spark3_5Adapter.resolveHoodieTable(Spark3_5Adapter.scala:52) at org.apache.spark.sql.hudi.analysis.HoodieAnalysis$ResolvesToHudiTable$.unapply(HoodieAnalysis.scala:362)
Change Logs
none
Impact
none
Risk level (write none, low medium or high below)
none
Documentation Update
None
Contributor's checklist
- [ ] Read through contributor's guide
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
When executed on a Delta table, this may result in an error.
What action are we executing here?
When executed on a Delta table, this may result in an error.
What action are we executing here?
like INSERT OVERWRITE delta./tmp/delta-table SELECT col1 as id FROM VALUES 5,6,7,8,9; in https://docs.delta.io/latest/quick-start.html
we internally use hoodiecatalog to handle delta table and other types of table. but hoodie(hudi-spark-datasource/hudi-spark3.5.x/src/main/scala/org/apache/spark/sql/adapter/Spark3_5Adapter.scala) will call v1Table when the table is delta and delta will throw exception, which should not be called when it is not a hudi table.
CI report:
- 0c01b0781e8c49da0f07a2379050c2be204cf373 Azure: SUCCESS
Bot commands
@hudi-bot supports the following commands:@hudi-bot run azurere-run the last Azure build