bazel
bazel copied to clipboard
Adding CPU Measurements and Unit Tests
This commit is about reporting cpu user time and cpu system time, that is already available, but not printed in any log.
Part one
The solution adds CPU metrics after a build as seen below:
INFO: 2 processes: 1 internal, 1 linux-sandbox. INFO: Total action wall time 70.05s INFO: Critical path 70.08s (setup 0.00s, action wall time 70.05s) INFO: Elapsed time 71.58s (preparation 1.45s, execution 70.13s) INFO: CPU time 5.31s (user 0.00s, system 0.01s, bazel 5.29s) INFO: Build completed successfully, 2 total actions
If any of the values is not collected, "???" is printed instead like:
INFO: 415 processes: 1 internal, 414 processwrapper-sandbox. INFO: Total action wall time 127.84s INFO: Critical path 2.58s (setup 0.00s, action wall time 2.51s) INFO: Elapsed time 38.37s (preparation 3.98s, execution 34.39s) INFO: CPU time ???s (user 79.45s, system 22.39s, bazel ???s) INFO: Build completed successfully, 415 total actions
Note: The values is only presented if the following flag is set: --experimental_stats_summary
The following flags must be set to get the measurements to be ollected: --experimental_collect_local_sandbox_action_metrics --experimental_profile_cpu_usage (set but default)
Part two
Unit Test is added in runtime-tests. Test command: ./misc/bazel test --experimental_stats_summary //src/test/java/com/google/devtools/build/lib:runtime-tests
Note: Due to Reporter class has keywords 'final', it'll block mock in UT. We didn't find a better way to solve this issue. So in this commit, this keyword 'final' in Reporter was removed.
Hello, due to I couldn't reproduce the issue locally. Who knows where to get the the log as below: FAILED: //src/test/java/com/google/devtools/build/lib:runtime-tests (Summary) /private/var/tmp/_bazel_buildkite/5e1880647af908bfbf43bdcc50066287/execroot/io_bazel/bazel-out/darwin-fastbuild/testlogs/src/test/java/com/google/devtools/build/lib/runtime-tests/test.log /private/var/tmp/_bazel_buildkite/5e1880647af908bfbf43bdcc50066287/execroot/io_bazel/bazel-out/darwin-fastbuild/testlogs/src/test/java/com/google/devtools/build/lib/runtime-tests/test_attempts/attempt_1.log /private/var/tmp/_bazel_buildkite/5e1880647af908bfbf43bdcc50066287/execroot/io_bazel/bazel-out/darwin-fastbuild/testlogs/src/test/java/com/google/devtools/build/lib/runtime-tests/test_attempts/attempt_2.log
Here is the answer for my question above.:
- Choose the failed test suite, by pressing ”details” link in github. That will take you to https://buildkite.com
- Choose the failed test suite again in buildkite.com.
- Switch tab from ”Log” to ”Artifacts” for the suite in buildkite.com, to access the log files.
New commit fixed the issue in Mac OS. It's due to Mac OS uses "," to divide integer and decimal fraction instead of '.' in other OS.
E.g.: In Mac OS, it's '9,00'. In other OS, it's '9.00'.
@ecngtng It's entirely possible that that difference in punctuation is not due to being on Mac per se, but due to Mac doing different i18n.
@larsrc-google So in this case, is it reasonable to distinguish the punctuation with related key words in OS name? Or?
Distinguishing on OS might cause problems for users in other countries with different default Locale.
One alternative could be to get the JVMs decimal separator char for current Locale.getDefault(), via DecimalFormatSymbols.getInstance().getDecimalSeparator() and use that as condition.
Another alternative could be to use String.format also when constructing the expected result in the test case.
@ulrfa @larsrc-google Thanks for your good comments. Codes have been updated with your comment.
Is this the right place to expose it? Most users likely don't care about CPU time.
The idea behind --experimental_stats_summary is to provide the most useful information in a more concise way.
If a user cares, they can always look it up in the BEP and the trace profile.
Thank you @meisterT for reviewing!
AFAIK, BEP provides CPU time only for:
- The JVM process.
and the --profile json file contains CPU usage only for:
- The JVM Process.
- All processes running on local host (including other builds and other concurrently executing processes).
I assume the metrics above are relevant mostly for the developers of bazel codebase itself. But this PR provides CPU time for:
- The JVM process. (“bazel”)
- All subprocesses spawned in sandboxes for the particular build. (“user” and “system”)
I think the CPU time, for the JVM and subprocesses combined, is one of the most relevant metrics for a user that would like to grasp the overall resource consumption of a build.
In the future, I imagine storing subprocess's CPU time per action in the profile json file, and then calculate summary grouped by mnemonic, to see how much CPU each tool in the build chain consumes. But I think this PR is a good start and makes information conveniently available.
Perhaps --experimental_stats_summary is not suitable for this. @meisterT, do you have ideas about other ways to control verbosity?
@meisterT What are your comments to @ulrfa's response? : )
@wilwell
Hello @ecngtng, could you please confirm that all the review comment are taken care? Thanks.
@ckolli5 yes, all the comments are taken care.
Code has been changed according to comments and all checks have passed.
I just re-requested review from wilwell. But I didn't remove request for meisterT and larsrc-google. I don't know why they two were removed by system. Or any wrong action was done by me? Anyway, any comments are welcome!
@wilwell thanks for your comments. And for your commnets about 'Could you move please this inititalization on executorInit?', I tried as you mentioned to move the initialization on executorInit. But some test case failed after it. It seems that it's more safe to keep them here.
@wilwell As your comments 'Call executorInit with options, instead of creating this method', I made some code change. The test passed in my local environment. But there were 2 failing fired when they were pushed to github. After checking related logs, the root cause was not dug out yet. Could you please help to check and give any clue? Thanks!
From log, we found the failure was because of 'one test failed in Unit tests for tools/test/collect_cc_code_coverage.sh' (i.e.: test_cc_test_coverage_gcov). And 'collect_cc_code_coverage.sh' is to collect code coverage for C++ sources. But our code change doesn't relate to C++. I'm curious how it could be impacted?
The problem with tests looks strange. Have you tried to merge/pull master branch into yours?
The problem with tests looks strange. Have you tried to merge/pull master branch into yours?
Do you mean to rebase my commit?
Yes, rebase on master your commit and rerun tests please.
@wilwell all checks have passed. If you have any further comments, please feel free to let's know.
@larsrc-google could you please also take a look ob this change. If everything is ok, I am gonna merge it.
@larsrc-google could you please help to take a look for this change as it has passed so long time? : )
@larsrc-google Codes have been changed according to your comments. Could you please help to check? Thanks!
Thank you. You missed one comment from before, and I added a couple small things.
@larsrc-google Thanks for your good comments. Except the comment that I'm not sure if I have got your point, all the others have been updated as you suggested. About "Zeroing them on startup would also make it clearer that they are properly initialized.", the zeroing actions have been done in both executorInit() and buildComplete(). Is it not ok? Or?
I don't see the zeroing happening in executorInit(), only in buildComplete(). If you zeroed them in executorInit(), that would be sufficient.
@larsrc-google You're right. After checking, I found they were really not in last commits. I added them back now. (Every interesting, I ever checked codes and confirmed that they merged several days ago.)
@larsrc-google Codes have been changed and all checks passed. Could you please help to check? Thanks!