alluxio
alluxio copied to clipboard
alluxio integration with Spark throw error when write/read hudi table
Alluxio Version: 2.9.3
Describe the bug A clear and concise description of what the bug is.
To Reproduce
1: setup alluxio cluster
2: cp alluxio client jar into spark jar path
3: create hudi table with alluxio namespace uri path
CREATE EXTERNAL TABLE tpcds_text_1000.dwd_charge_transaction_record_v_partition3 (
chain_id STRING,
create_time STRING,
id STRING,
...)
using hudi
tblproperties (
type = 'cow',
primaryKey = 'id',
preCombineField = 'ts'
)
PARTITIONED BY (ts STRING)
LOCATION 'alluxio:///1000/dwd_charge_transaction_record_v_partition'
4: using spark sql write into the hudi table
spark-sql --jars /usr/lib/hudi/hudi-spark-bundle.jar --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' --conf "spark.driver.memory=32G" --conf "spark.executor.memory=40G" --conf "spark.dynamicAllocation.enabled=true" insert into ....
Expected behavior throw exception as alluxio object not serialized in Spark task 23/07/16 07:36:03 ERROR TaskSetManager: task 0.0 in stage 0.0 (TID 0) had a not serializable result: alluxio.client.file.URIStatus
- field (class: alluxio.hadoop.AlluxioFileStatus, name: mUriStatus, type: class alluxio.client.file.URIStatus)
- object (class alluxio.hadoop.AlluxioFileStatus, AlluxioFileStatus{path=alluxio:/1000/flink-hudi/logevent_sink_test/.hoodie; isDirectory=true; modification_time=1689492891699; access_time=1689492891699; owner=hadoop; group=hadoop; permission=rwxrwxrwx; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false; metadata=null})
- element of array (index: 0)
Urgency serv3 not effect prod immediately
Are you planning to fix it Please indicate if you are already working on a PR.
Additional context Add any other context about the problem here.
Thanks for raising the issue, I would take a look!
@HelloHorizon do you think we can support this use case?
check the pr, it may resolve the issue https://github.com/Alluxio/alluxio/pull/18266/files
you can also configure the URIStatus with kyro serialized and try again