spark
spark copied to clipboard
Regularly poll executors to track their utilization
This isn't ready to be merged yet (needs tests & docs), but wanted to get some feedback.
The point of this PR is to provide a really high-level metric on how well the cluster is being utilized, by simply polling what percentages of cores are active. I'm hopefully this will diagnose a lot of cases where jobs are slow b/c spark is being used incorrectly -- imbalanced jobs, driver spending all its time merging accumulator values, long breaks between stages, etc.
The ugly part of this is plumbing the events from StandaloneSchedulerBackend --> ClusterScheduler --> SparkContext --> DAGScheduler --> SparkListener. I'm not sure what the right architecture is in general to let any arbitrary component get an event to the SparkListeners.
Thank you for your pull request. An admin will review this request soon.
Why not ganglia or other external monitoring tools?
@rxin yeah ganglia could provide something pretty close to this. But I thought this was useful b/c
(1) this is so simple, its useful to have even if you don't setup ganglia. And as this is integrated right into spark, its easier to connect these measures w/ whats going on in your code (you don't just have a ganglia graph w/ times, which then you've got connect back to what was going on in your code). Its not measuring the exact same thing as ganglia would w/ core utilization, but I can't make a really strong case why this is particularly better. If there is eventually really tight integration w/ ganglia, then maybe this could get ripped out.
(2) I was hoping that this might get used for more thorough polling of the executors, eg. stack trace sampling, task progress, etc. So might be a useful stepping stone even if the "core utilization" part is dropped eventually.
Imran, this approach looks good to me, but I'm going to send it to Patrick, who's been looking at monitoring stuff too. I think these are reasonable API calls to add to the listener though.
Hey Imran,
Just a high level question (haven't done a close look yet). If this is all of the information we are collecting - why do you need to poll the executors in the first place?
The information of which tasks are running on which executor when is available directly at the driver. You could actually get much finer grained utilization statistics using that information without the need to add RPC's.
- Patrick
Ah well, part of the motivation for this is that we noticed huge delays between when the executor thinks its finished a task, and when the driver fully registered it -- over 50% of the time actually for some of our workloads. We were able to fix this when we discovered it (in our case it seemed to be mostly the cost of merging the accumulator results on the driver), but I'd really like to make this inefficiency more obvious.
If we really wanted to, we could probably do this entirely within the driver, but I guess I did it this way b/c I was hoping to piggyback more info on top of those RPCs in the future -- eg., maybe jvm metrics, shuffle status, counters could also come via the same mechanism.
FWIW I've wanted a quick & dirty "what's the cluster utilization like?" UI for a long time--agreed that other tools should be used for more extensive monitoring, but it would be nice to have some really basic info available for free/out of the box.
(I haven't looked at the code, so can't comment on the approach/implementation, just chiming in to say I'd enjoy seeing this happen.)
Thank you for your pull request. An admin will review this request soon.