alluxio icon indicating copy to clipboard operation
alluxio copied to clipboard

alluxio integration with Spark throw error when write/read hudi table

Open qingyuan18 opened this issue 1 year ago • 4 comments

Alluxio Version: 2.9.3

Describe the bug A clear and concise description of what the bug is.

To Reproduce 1: setup alluxio cluster 2: cp alluxio client jar into spark jar path 3: create hudi table with alluxio namespace uri path CREATE EXTERNAL TABLE tpcds_text_1000.dwd_charge_transaction_record_v_partition3 ( chain_id STRING, create_time STRING, id STRING, ...) using hudi
tblproperties ( type = 'cow', primaryKey = 'id', preCombineField = 'ts' ) PARTITIONED BY (ts STRING)
LOCATION 'alluxio:///1000/dwd_charge_transaction_record_v_partition'

4: using spark sql write into the hudi table

spark-sql --jars /usr/lib/hudi/hudi-spark-bundle.jar --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' --conf "spark.driver.memory=32G" --conf "spark.executor.memory=40G" --conf "spark.dynamicAllocation.enabled=true" insert into ....

Expected behavior throw exception as alluxio object not serialized in Spark task 23/07/16 07:36:03 ERROR TaskSetManager: task 0.0 in stage 0.0 (TID 0) had a not serializable result: alluxio.client.file.URIStatus

  • field (class: alluxio.hadoop.AlluxioFileStatus, name: mUriStatus, type: class alluxio.client.file.URIStatus)
  • object (class alluxio.hadoop.AlluxioFileStatus, AlluxioFileStatus{path=alluxio:/1000/flink-hudi/logevent_sink_test/.hoodie; isDirectory=true; modification_time=1689492891699; access_time=1689492891699; owner=hadoop; group=hadoop; permission=rwxrwxrwx; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false; metadata=null})
  • element of array (index: 0)

Urgency serv3 not effect prod immediately

Are you planning to fix it Please indicate if you are already working on a PR.

Additional context Add any other context about the problem here.

qingyuan18 avatar Jul 21 '23 10:07 qingyuan18

Thanks for raising the issue, I would take a look!

jja725 avatar Jul 22 '23 02:07 jja725

@HelloHorizon do you think we can support this use case?

jja725 avatar Jul 26 '23 19:07 jja725

check the pr, it may resolve the issue https://github.com/Alluxio/alluxio/pull/18266/files

xiaohu-liu avatar Mar 21 '24 01:03 xiaohu-liu

you can also configure the URIStatus with kyro serialized and try again

xiaohu-liu avatar Mar 21 '24 01:03 xiaohu-liu