Kai Huang comments

Results 136 comments of


                                            Kai Huang

DLlib: add enable_hdfs_load decorator

Need to add a unit test

[Orca Draft] Tutorial for Submit and Run Programs on K8s

More discussions here: https://github.com/intel-analytics/arda-docker/issues/682 Need to confirm the discussions before merging this PR.

Fix duplicate repartition in tf2 estimator spark backend

Why do we must have the number of partitions equal to the number of workers? Repartition is expensive, if the number of partitions is already larger than the number of...

Fix duplicate repartition in tf2 estimator spark backend

> > Why do we must have the number of partitions equal to the number of workers? Repartition is expensive, if the number of partitions is already larger than the...

Fix duplicate repartition in tf2 estimator spark backend

Will coalesce result in unbalanced partitions? e.g. node1 has 9 partitions and node2 has 1 partition, after coalecse to 2 partitions, will each new partition has 5 smaller partitions or...

Fix duplicate repartition in tf2 estimator spark backend

After changing to coalesce will throw the following error: ``` > format(target_id, ".", name), value) E py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. E : org.apache.spark.scheduler.BarrierJobUnsupportedRDDChainException: [SPARK-24820][SPARK-24821]: Barrier execution mode...

Fix duplicate repartition in tf2 estimator spark backend

@jason-dai @jenniew barrier can't be performed on the rdd that comes from coalesce... So we still keep the repartition if rdd.getNumPartitions > num workers?

Fix duplicate repartition in tf2 estimator spark backend

> take Seems no? To reduce num partitions without shuffle, use coalesce, which can't be combined with barrier. Unsupported to increase partitions without shuffle: https://stackoverflow.com/questions/71070709/increase-the-number-of-partitions-without-repartition-on-hadoop

Fix duplicate repartition in tf2 estimator spark backend

> Will coalesce result in unbalanced partitions? e.g. node1 has 9 partitions and node2 has 1 partition, after coalecse to 2 partitions, will each new partition has 5 smaller partitions...

Fix duplicate repartition in tf2 estimator spark backend

Issue conclusion: - The implementation of spark backend requires the number of data partitions equal to the number of workers (a limitation from the original design, but not possible to...