sparkflow icon indicating copy to clipboard operation
sparkflow copied to clipboard

Move to Barrier Executors

Open dmmiller612 opened this issue 5 years ago • 1 comments

With spark 2.4.0, Barrier Executors were added to ensure tasks run at the same time. We should add this for training in SparkFlow.

dmmiller612 avatar Nov 28 '18 14:11 dmmiller612

I would love that. Cause it seems to me, that the impact of not running the computations in parallel is the reason why models could underperform due to some partitions finishing last and having therefor the biggest impact on the final model. (https://www.youtube.com/watch?v=nNrdv45O3pE at 15:00) At least this is what i understood when listening to this interesting talk.

BTW: Should all partitions of the training data be of the same size? Are there any guarantuees on how close the model performance of this async training are to the ones of normal training or some other estimates that help me get a grasp on the impact of this execution model like "Given the same initial weights, would different model building processes with the same data lead to very different final weights just because of other proccesses running on the workers which might lead to partitions finishing in different orders?".

edit Ok, I found the paper: https://papers.nips.cc/paper/4390-hogwild-a-lock-free-approach-to-parallelizing-stochastic-gradient-descent.pdf edit

PS: I love your API and the fact that you decided to make it work so seemlessly with spark pipelines.

PowerToThePeople111 avatar Jul 03 '19 17:07 PowerToThePeople111