bazel icon indicating copy to clipboard operation
bazel copied to clipboard

Cpu timec to bep

Open alexei-grigoriev opened this issue 3 years ago • 2 comments

This PR adds support to BEP for collecting aggregated CPU time of spawned subprocesss, grouped by mnemonic.

Main use case is to identify which mnemonics are bottlenecks and should be prioritized for optimization. Such optimizations can be done by improving associated tool chains or improving cache hit ratio for those mnemonics. Collecting the data via BEP, from all user and all CI builds, ensure that the metrics are representative in general (not only cherry-picked specific scenarios). Trends can be visualized on dashboards grouped by mnemonic.

When observing increased average builds time trends (wall clock time) for an organization, it is often not obvious to identify the cause. CPU time metrics can then not only help identifying regressions for a particular mnemonic, but by correlating CPU time with wall clock time, it can also indicate if the cause is an environmental issue or not.

alexei-grigoriev avatar Jun 08 '22 14:06 alexei-grigoriev

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

google-cla[bot] avatar Jun 08 '22 14:06 google-cla[bot]

Is there any update on the reviewing of this PR?

crydell-ericsson avatar Dec 15 '22 14:12 crydell-ericsson

Nice that you explicitly handle the case of missing times. Do you expect a given mnemonic to either always have times or never, or for it to be more mixed? And how is the Empty state materially different from having a zero duration? You could drop the tristate class and instead have a boolean indicating validity, we already know how many actions were executed.

larsrc-google avatar Jan 24 '23 08:01 larsrc-google

That's correct. For a particular mnemonic I expect mixed i.e. some actions to have times and other no not have them. For example we could have a bunch of tests which are executed remotely but some of them are tagged for local execution and only locally executed would bring times. And in this case I consider timings totally not consistent.

alexei-grigoriev avatar Jan 24 '23 09:01 alexei-grigoriev

@larsrc-google Do You mean something like that: systemTimeValid = false; if (numActions ==0) systemTimeValid = true; if (actionStats.systemTime.isEmpty()) systemTimeValid = false; else { accumulate time....}

so that instead of Empty state handling we rely on the actions counter

looks a bit more tangled but way less verbose to me.

alexei-grigoriev avatar Jan 24 '23 09:01 alexei-grigoriev

Well, if there were no actions, there's no reason to output the accumulated time in the first place. If there were actions (numActions > 0) but systemTime.isEmpty(), then the time would have to be invalid. So I would reduce it to

if (numActions > 0) {
  if (actionsStats.systemTime.isValid()) {
    <accumulate system time...>
  }
  if (actionStats.userTime.isValid()) {
    <accumulate user time...>
  }
}

Your nicely functional update implementation creates rather a lot of unnecessary data. I don't see the benefit in creating new accumulator objects on each update, a single object that updates atomically should be enough here. Or even just use a LongAccumulator/LongAdder in ActionStats.

larsrc-google avatar Jan 24 '23 11:01 larsrc-google

But there is no way to invalidate times unless some Boolean is used along with the adder. Like if there were several actions processed and their times was accumulated then comes an action without times, I guess we want to express that times for particular mnemonic are inconsistent and shouldn't be printed.

alexei-grigoriev avatar Jan 24 '23 13:01 alexei-grigoriev

A wrapper around the Accumulator/Adder that tracks validity would make sense. Then it could also have an interface that takes Duration and just use the long value internally.

larsrc-google avatar Jan 25 '23 11:01 larsrc-google

@larsrc-google what do you think about my latest commit using microseconds? Would you like something more from me to land this?

alexei-grigoriev avatar Jan 31 '23 08:01 alexei-grigoriev

Looks good overall, I'll add some smaller comments soon.

larsrc-google avatar Jan 31 '23 15:01 larsrc-google

I am a bit confused with it. The darwin xcode tests are failing, despite I didn't make any specific changes neither for xcode nor for darwin.

alexei-grigoriev avatar Feb 20 '23 09:02 alexei-grigoriev

Ugh, not again. There has been a number of failures on the Darwin tests since a recent update. @fweikert

Don't worry about trying to fix that, just let me know when you're ready for further review.

larsrc-google avatar Feb 20 '23 09:02 larsrc-google

just let me know when you're ready for further review

It is ready.

alexei-grigoriev avatar Feb 20 '23 10:02 alexei-grigoriev

Hi @larsrc-google, Since I can see that this PR has been approved, please let me know whether I should proceed with importing it. Thanks!

sgowroji avatar Mar 15 '23 12:03 sgowroji

I noticed, there is no activity here for a while? Is there anything I can do to help it progressing?

alexei-grigoriev avatar Apr 14 '23 15:04 alexei-grigoriev

@sgowroji Yes, you can go ahead, if syncing to head fixes the failing test.

@alexei-grigoriev I'm guessing the test failure is unrelated, if it sticks around when moved to head you'll need to fix it.

larsrc-google avatar Apr 19 '23 08:04 larsrc-google

Unfortunately, we made a change in the metrics code (replacing Duration with ints to save both memory and time), and now your change needs to be updated. I would suggest changing your DurationAdder to just work with the ints, even though you lose micro-second precision.

larsrc-google avatar Apr 20 '23 10:04 larsrc-google

Hi @larsrc-google Could You please check why it has been closed unmerged?

alexei-grigoriev avatar Apr 28 '23 06:04 alexei-grigoriev

Hi @alexei-grigoriev, PR is merged at 5f1a5709b34ca578f9bbb995d62cd028dac3b03c.

sgowroji avatar Apr 28 '23 07:04 sgowroji