bazel Adding CPU Measurements and Unit Tests

This commit is about reporting cpu user time and cpu system time, that is already available, but not printed in any log.

Part one

The solution adds CPU metrics after a build as seen below:

INFO: 2 processes: 1 internal, 1 linux-sandbox. INFO: Total action wall time 70.05s INFO: Critical path 70.08s (setup 0.00s, action wall time 70.05s) INFO: Elapsed time 71.58s (preparation 1.45s, execution 70.13s) INFO: CPU time 5.31s (user 0.00s, system 0.01s, bazel 5.29s) INFO: Build completed successfully, 2 total actions

If any of the values is not collected, "???" is printed instead like:

INFO: 415 processes: 1 internal, 414 processwrapper-sandbox. INFO: Total action wall time 127.84s INFO: Critical path 2.58s (setup 0.00s, action wall time 2.51s) INFO: Elapsed time 38.37s (preparation 3.98s, execution 34.39s) INFO: CPU time ???s (user 79.45s, system 22.39s, bazel ???s) INFO: Build completed successfully, 415 total actions

Note: The values is only presented if the following flag is set: --experimental_stats_summary

The following flags must be set to get the measurements to be ollected: --experimental_collect_local_sandbox_action_metrics --experimental_profile_cpu_usage (set but default)

Part two

Unit Test is added in runtime-tests. Test command: ./misc/bazel test --experimental_stats_summary //src/test/java/com/google/devtools/build/lib:runtime-tests

Note: Due to Reporter class has keywords 'final', it'll block mock in UT. We didn't find a better way to solve this issue. So in this commit, this keyword 'final' in Reporter was removed.

Dec 23 '21 09:12 ecngtng

Hello, due to I couldn't reproduce the issue locally. Who knows where to get the the log as below: FAILED: //src/test/java/com/google/devtools/build/lib:runtime-tests (Summary) /private/var/tmp/_bazel_buildkite/5e1880647af908bfbf43bdcc50066287/execroot/io_bazel/bazel-out/darwin-fastbuild/testlogs/src/test/java/com/google/devtools/build/lib/runtime-tests/test.log /private/var/tmp/_bazel_buildkite/5e1880647af908bfbf43bdcc50066287/execroot/io_bazel/bazel-out/darwin-fastbuild/testlogs/src/test/java/com/google/devtools/build/lib/runtime-tests/test_attempts/attempt_1.log /private/var/tmp/_bazel_buildkite/5e1880647af908bfbf43bdcc50066287/execroot/io_bazel/bazel-out/darwin-fastbuild/testlogs/src/test/java/com/google/devtools/build/lib/runtime-tests/test_attempts/attempt_2.log

Dec 27 '21 03:12 ecngtng

Here is the answer for my question above.:

Choose the failed test suite, by pressing ”details” link in github. That will take you to https://buildkite.com
Choose the failed test suite again in buildkite.com.
Switch tab from ”Log” to ”Artifacts” for the suite in buildkite.com, to access the log files.

Jan 04 '22 10:01 ecngtng

New commit fixed the issue in Mac OS. It's due to Mac OS uses "," to divide integer and decimal fraction instead of '.' in other OS.

E.g.: In Mac OS, it's '9,00'. In other OS, it's '9.00'.

Jan 04 '22 10:01 ecngtng

@ecngtng It's entirely possible that that difference in punctuation is not due to being on Mac per se, but due to Mac doing different i18n.

Jan 24 '22 10:01 larsrc-google

@larsrc-google So in this case, is it reasonable to distinguish the punctuation with related key words in OS name? Or?

Jan 24 '22 10:01 ecngtng

Distinguishing on OS might cause problems for users in other countries with different default Locale.

One alternative could be to get the JVMs decimal separator char for current Locale.getDefault(), via DecimalFormatSymbols.getInstance().getDecimalSeparator() and use that as condition.

Another alternative could be to use String.format also when constructing the expected result in the test case.

Jan 24 '22 13:01 ulrfa

@ulrfa @larsrc-google Thanks for your good comments. Codes have been updated with your comment.

Jan 25 '22 05:01 ecngtng

Is this the right place to expose it? Most users likely don't care about CPU time.

The idea behind --experimental_stats_summary is to provide the most useful information in a more concise way.

If a user cares, they can always look it up in the BEP and the trace profile.

Jan 26 '22 08:01 meisterT

Thank you @meisterT for reviewing!

AFAIK, BEP provides CPU time only for:

The JVM process.

and the --profile json file contains CPU usage only for:

The JVM Process.
All processes running on local host (including other builds and other concurrently executing processes).

I assume the metrics above are relevant mostly for the developers of bazel codebase itself. But this PR provides CPU time for:

The JVM process. (“bazel”)
All subprocesses spawned in sandboxes for the particular build. (“user” and “system”)

I think the CPU time, for the JVM and subprocesses combined, is one of the most relevant metrics for a user that would like to grasp the overall resource consumption of a build.

In the future, I imagine storing subprocess's CPU time per action in the profile json file, and then calculate summary grouped by mnemonic, to see how much CPU each tool in the build chain consumes. But I think this PR is a good start and makes information conveniently available.

Perhaps --experimental_stats_summary is not suitable for this. @meisterT, do you have ideas about other ways to control verbosity?

Jan 26 '22 14:01 ulrfa

@meisterT What are your comments to @ulrfa's response? : )

Feb 10 '22 10:02 ecngtng

@wilwell

Feb 14 '22 09:02 larsrc-google

Hello @ecngtng, could you please confirm that all the review comment are taken care? Thanks.

Apr 27 '22 05:04 ckolli5

@ckolli5 yes, all the comments are taken care.

Apr 27 '22 05:04 ecngtng

Code has been changed according to comments and all checks have passed.

Aug 15 '22 07:08 ecngtng

I just re-requested review from wilwell. But I didn't remove request for meisterT and larsrc-google. I don't know why they two were removed by system. Or any wrong action was done by me? Anyway, any comments are welcome!

Aug 26 '22 09:08 ecngtng

@wilwell thanks for your comments. And for your commnets about 'Could you move please this inititalization on executorInit?', I tried as you mentioned to move the initialization on executorInit. But some test case failed after it. It seems that it's more safe to keep them here.

Aug 31 '22 07:08 ecngtng

@wilwell As your comments 'Call executorInit with options, instead of creating this method', I made some code change. The test passed in my local environment. But there were 2 failing fired when they were pushed to github. After checking related logs, the root cause was not dug out yet. Could you please help to check and give any clue? Thanks!

Sep 16 '22 10:09 ecngtng

From log, we found the failure was because of 'one test failed in Unit tests for tools/test/collect_cc_code_coverage.sh' (i.e.: test_cc_test_coverage_gcov). And 'collect_cc_code_coverage.sh' is to collect code coverage for C++ sources. But our code change doesn't relate to C++. I'm curious how it could be impacted?

Sep 20 '22 06:09 ecngtng

The problem with tests looks strange. Have you tried to merge/pull master branch into yours?

Sep 23 '22 17:09 wilwell

The problem with tests looks strange. Have you tried to merge/pull master branch into yours?

Do you mean to rebase my commit?

Sep 28 '22 06:09 ecngtng

Yes, rebase on master your commit and rerun tests please.

Sep 28 '22 14:09 wilwell

@wilwell all checks have passed. If you have any further comments, please feel free to let's know.

Sep 29 '22 13:09 ecngtng

@larsrc-google could you please also take a look ob this change. If everything is ok, I am gonna merge it.

Sep 29 '22 14:09 wilwell

@larsrc-google could you please help to take a look for this change as it has passed so long time? : )

Feb 06 '23 09:02 ecngtng

@larsrc-google Codes have been changed according to your comments. Could you please help to check? Thanks!

Feb 22 '23 02:02 ecngtng

Thank you. You missed one comment from before, and I added a couple small things.

Feb 22 '23 08:02 larsrc-google

@larsrc-google Thanks for your good comments. Except the comment that I'm not sure if I have got your point, all the others have been updated as you suggested. About "Zeroing them on startup would also make it clearer that they are properly initialized.", the zeroing actions have been done in both executorInit() and buildComplete(). Is it not ok? Or?

Feb 23 '23 10:02 ecngtng

I don't see the zeroing happening in executorInit(), only in buildComplete(). If you zeroed them in executorInit(), that would be sufficient.

Feb 23 '23 10:02 larsrc-google

@larsrc-google You're right. After checking, I found they were really not in last commits. I added them back now. (Every interesting, I ever checked codes and confirmed that they merged several days ago.)

Feb 23 '23 12:02 ecngtng

@larsrc-google Codes have been changed and all checks passed. Could you please help to check? Thanks!

Feb 24 '23 10:02 ecngtng