spark icon indicating copy to clipboard operation
spark copied to clipboard

Regularly poll executors to track their utilization

Open squito opened this issue 12 years ago • 8 comments

This isn't ready to be merged yet (needs tests & docs), but wanted to get some feedback.

The point of this PR is to provide a really high-level metric on how well the cluster is being utilized, by simply polling what percentages of cores are active. I'm hopefully this will diagnose a lot of cases where jobs are slow b/c spark is being used incorrectly -- imbalanced jobs, driver spending all its time merging accumulator values, long breaks between stages, etc.

The ugly part of this is plumbing the events from StandaloneSchedulerBackend --> ClusterScheduler --> SparkContext --> DAGScheduler --> SparkListener. I'm not sure what the right architecture is in general to let any arbitrary component get an event to the SparkListeners.

squito avatar May 15 '13 21:05 squito

Thank you for your pull request. An admin will review this request soon.

AmplabJenkins avatar May 15 '13 21:05 AmplabJenkins

Why not ganglia or other external monitoring tools?

rxin avatar May 15 '13 21:05 rxin

@rxin yeah ganglia could provide something pretty close to this. But I thought this was useful b/c

(1) this is so simple, its useful to have even if you don't setup ganglia. And as this is integrated right into spark, its easier to connect these measures w/ whats going on in your code (you don't just have a ganglia graph w/ times, which then you've got connect back to what was going on in your code). Its not measuring the exact same thing as ganglia would w/ core utilization, but I can't make a really strong case why this is particularly better. If there is eventually really tight integration w/ ganglia, then maybe this could get ripped out.

(2) I was hoping that this might get used for more thorough polling of the executors, eg. stack trace sampling, task progress, etc. So might be a useful stepping stone even if the "core utilization" part is dropped eventually.

squito avatar May 16 '13 00:05 squito

Imran, this approach looks good to me, but I'm going to send it to Patrick, who's been looking at monitoring stuff too. I think these are reasonable API calls to add to the listener though.

mateiz avatar Jun 13 '13 20:06 mateiz

Hey Imran,

Just a high level question (haven't done a close look yet). If this is all of the information we are collecting - why do you need to poll the executors in the first place?

The information of which tasks are running on which executor when is available directly at the driver. You could actually get much finer grained utilization statistics using that information without the need to add RPC's.

  • Patrick

pwendell avatar Jun 14 '13 18:06 pwendell

Ah well, part of the motivation for this is that we noticed huge delays between when the executor thinks its finished a task, and when the driver fully registered it -- over 50% of the time actually for some of our workloads. We were able to fix this when we discovered it (in our case it seemed to be mostly the cost of merging the accumulator results on the driver), but I'd really like to make this inefficiency more obvious.

If we really wanted to, we could probably do this entirely within the driver, but I guess I did it this way b/c I was hoping to piggyback more info on top of those RPCs in the future -- eg., maybe jvm metrics, shuffle status, counters could also come via the same mechanism.

squito avatar Jun 15 '13 15:06 squito

FWIW I've wanted a quick & dirty "what's the cluster utilization like?" UI for a long time--agreed that other tools should be used for more extensive monitoring, but it would be nice to have some really basic info available for free/out of the box.

(I haven't looked at the code, so can't comment on the approach/implementation, just chiming in to say I'd enjoy seeing this happen.)

stephenh avatar Jun 17 '13 18:06 stephenh

Thank you for your pull request. An admin will review this request soon.

AmplabJenkins avatar Aug 05 '13 21:08 AmplabJenkins