Mark Hamstra
Mark Hamstra
No disagreement that just timing the unit tests isn't an adequate measure of the change in scheduler performance. I was just using what I had as a rough check that...
UPDATE: I broke out renaming and other minor changes into a separate pull request; so after https://github.com/mesos/spark/pull/844 is merged, this one reduces to just the stageId jobId mapping stuff.
So, a few things worth talking about: 1. This PR essentially boils down to moving at least part of the logic from the JobLogger directly into the DAGScheduler. 2. At...
Still work in progress. I'm adding "data structures are now empty" assertions at the end of the tests in DAGSchedulerSuite and need to do some work still on shuffleToMapStage in...
Updated. Still WIP.
UPDATED: The DAGSchedulerSuite now includes checks in all tests except the "zero split job", asserting that all DAGScheduler data structures are empty at the end of each test job. These...
I've got spark-perf numbers: scheduling-tput --num-tasks=10000 --num-trials=10 --inter-trial-wait=30, baseline: 4.955, 0.156, 4.539, 4.955, 4.766 PR842: 5.016, 0.106, 4.786, 5.071, 4.892 scala-agg-by-key --num-trials=10 --inter-trial-wait=30 --num-partitions=1 --reduce-tasks=1 --random-seed=5 --persistent-type=memory --num-records=200 --unique-keys=1 --key-length=10...
It's definitely getting there. I'm curious why DriverSuite is failing at least some of the time and why the Python tests are frequently complaining about [tasks still pending for a...
Rebased and fixed some typos and grammatical niggles.
Addressed the pendingTasks bloat -- it wasn't just for Python tests, but for every job. I think I got the life-cycle issues correct (i.e. will always be a set of...