Mark Hamstra comments

Results 13 comments of


                                            Mark Hamstra

Added stageId <--> jobId mapping in DAGScheduler

No disagreement that just timing the unit tests isn't an adequate measure of the change in scheduler performance. I was just using what I had as a rough check that...

Added stageId <--> jobId mapping in DAGScheduler

UPDATE: I broke out renaming and other minor changes into a separate pull request; so after https://github.com/mesos/spark/pull/844 is merged, this one reduces to just the stageId jobId mapping stuff.

Added stageId <--> jobId mapping in DAGScheduler

So, a few things worth talking about: 1. This PR essentially boils down to moving at least part of the logic from the JobLogger directly into the DAGScheduler. 2. At...

Added stageId <--> jobId mapping in DAGScheduler

Still work in progress. I'm adding "data structures are now empty" assertions at the end of the tests in DAGSchedulerSuite and need to do some work still on shuffleToMapStage in...

Added stageId <--> jobId mapping in DAGScheduler

Updated. Still WIP.

Added stageId <--> jobId mapping in DAGScheduler

UPDATED: The DAGSchedulerSuite now includes checks in all tests except the "zero split job", asserting that all DAGScheduler data structures are empty at the end of each test job. These...

Added stageId <--> jobId mapping in DAGScheduler

I've got spark-perf numbers: scheduling-tput --num-tasks=10000 --num-trials=10 --inter-trial-wait=30, baseline: 4.955, 0.156, 4.539, 4.955, 4.766 PR842: 5.016, 0.106, 4.786, 5.071, 4.892 scala-agg-by-key --num-trials=10 --inter-trial-wait=30 --num-partitions=1 --reduce-tasks=1 --random-seed=5 --persistent-type=memory --num-records=200 --unique-keys=1 --key-length=10...