error-prone In verbose mode, log Error Prone performance details

In e5ffbe4b14f2c4e59aef331e308f23d602fdd431 Error Prone added support for tracking the time spent by each check. This feature was not exposed in a way that OSS users of Error Prone can easily use it. This PR is a proposal of how such support may be added. Feedback very welcome!

From the commit message:

This change introduces a `TaskListener` which logs collected
`ErrorProneTimings` post compilation.

Additionally, since `SuggestedFixes#compilesWithFix` is a very heavy
operation which significantly contributes to overall runtime, any usage
of this method is also logged. Rationale behind this decision is that if
many `compilesWithFix` checks fail, then this can "silently" slow down
the build. By logging such cases one can either invest time in
optimizing the associated check, or manually improving the code that
triggers the compilation attempts.

After this change the information is exposed when compiling with -verbose.

Jan 12 '20 14:01 Stephan202

Rebased and resolved conflicts. What does the Error Prone team think of this PR?

Nov 13 '21 10:11 Stephan202

I have rebased the branch and squashed in a small change to also expose the initialization time recorded since 8ddb7cbb05dc4b278836fd5019751c4e225323d9.

NB: I see thumbs-up from Uber, Red Hat and Picnic people; though this PR might not have much utility inside Google thanks to other (certainly superior) integrations, it might help open source users.

Dec 31 '22 09:12 Stephan202

Thanks for this!

I have some questions about the details, but overall making this useful/available in OSS seems like a good thing to do.

For some background on e5ffbe4b14f2c4e59aef331e308f23d602fdd431, we're currently using that to sample performance for a fraction of javac invocations in our remote build setup. The statistics are written to a separate file that gets collected, so we can look at the aggregate data. For understanding the performance of a single compilation I usually just use a profiler, but having the aggregates is useful to understand how checks perform overall, especially if there a check only performs badly on some compilations.

Is the intended use-case here that if you wanted to dig into the performance of a build, you'd add -verbose, and then review the output from Error Prone?

Would you prefer having the output in the build log, vs. writing it to a separate file?

Another question is about the use of -verbose, vs. a separate Error Prone-specific flag. Using -verbose could be nice for users that benefit from the additional information and might not know about a separate flag, but could also be unwanted if they just wanted the standard -verbose output. WDYT?

Dec 31 '22 14:12 cushon

Is the intended use-case here that if you wanted to dig into the performance of a build, you'd add -verbose, and then review the output from Error Prone?

Yes, so far I've used this primarily after Error Prone (Support) upgrades by specifying -Dmaven.compiler.verbose on the command line, and then grep-ing the result. Bare-bones, but it did the job when looking for significant changes in performance.

Would you prefer having the output in the build log, vs. writing it to a separate file?

I like the idea of outputting the result to a separate file, especially if the result comes in an easily parsable format such as TSV or JSON. In this case it seems better to use a custom flag, since intuitively I would expect -verbose to only influence "regular" compiler output. An added bonus of a custom flag would be, that it could double as a means of specifying the output file location. (Maven users would then provide a value that starts with ${project.build.directory}.)

Coincidentally my colleagues @rickie and @chamil-prabodha have started looking into the more general topic of "Error Prone metrics", which (IIRC) includes timing metrics, but also statistics on how often violations trigger or patches are applied. That project was partially inspired by the Maven OpenTelemetry extension, which might hint at another possible approach. Perhaps they can share some details on their current thinking around this topic. :eyes:

Dec 31 '22 17:12 Stephan202

which (IIRC) includes timing metrics, but also statistics on how often violations trigger or patches are applied.

Indeed, @Stephan202, you explained it well.

Our main goal is to know how many violations Error Prone (Support) flags in all of our repositories to measure the impact. This holds for both Error Prone checks and Refaster rules. For every violation we want to collect the exact location (line number, file, and module). We want to achieve this by using something like the DiagnosticCollector (see this test from ErrorProneJavacPluginTest.java for a similar example). Aggregating this together with timing metrics, it will give us a good insight into how checks are performing. We can use this data to decide what we should work on or which checks we should improve.

For example, enabling our Refaster rules by default would have a significant impact on build time, so currently this is behind a Maven profile. We are working on improving Refaster's performance. Having timing metrics make it easier to get data on this.

Would you prefer having the output in the build log, vs. writing it to a separate file?

Writing it to a separate file is preferred as we can persist that to later do an analysis.

Jan 04 '23 08:01 rickie

Taking another look at this as I continue slowly working through the PR backlog, the only part of this I'm wondering about is:

Another question is about the use of -verbose, vs. a separate Error Prone-specific flag. Using -verbose could be nice for users that benefit from the additional information and might not know about a separate flag, but could also be unwanted if they just wanted the standard -verbose output. WDYT?

Would using a new Error Prone-separate flag instead of -verbose work for your use-case of this feature, or do you strongly prefer -verbose? I still like the idea of a separate flag to avoid changing the behaviour of the existing -verbose flag. But maybe there are aspects of this I'm not considering, and usage of -verbose is probably fairly low anyways.

Sep 13 '23 15:09 cushon

Would using a new Error Prone-separate flag instead of -verbose work for your use-case of this feature,

I think it would work for us: based on some tests, Log#printVerbose output does show up in the build log even if verbose compiler output isn't enabled (i.e. without -Dmaven.compiler.verbose), so using a separate flag wouldn't require users to set both flags. Arguably this approach is even nicer, since users will be able to see the timing results without having to wade through tons of other output.

If we introduce a new flag for console output, I suppose it could take an integer argument, such that users can specify e.g. -XepListNSlowestChecks=10. (Not yet in love with that name; suggestions welcome.)

NB: re-reading the thread, there is some appetite for outputting the timings to a separate file. Makes sense, since this would allow for easier analysis. Such a feature could perhaps be enabled using a flag such as -XepCheckTimingOutputFile=somePath (again, name TBD). This would however raise new questions about the output format: ideally we investigate whether there are any standard file formats for such a use case. I'm not aware of any, but didn't carefully search.

Since these features aren't mutually exclusive, I suppose we can focus on the former (console output) here, and defer any decision about the latter. WDYT? I'm up for updating the PR accordingly if you agree.

Sep 14 '23 05:09 Stephan202

If we introduce a new flag for console output, I suppose it could take an integer argument, such that users can specify e.g. -XepListNSlowestChecks=10. (Not yet in love with that name; suggestions welcome.)

-XepListNSlowestChecks= is fine with me. Another option would be -XepLogVerbosePerformanceDetails= or something, which is maybe more descriptive of the overall feature but doesn't example why it takes an int value.

It also occurred to me we could use -XepOpt: and skip having to define another top-level flag, but to be clear I have no objections to adding top level flags.

Since these features aren't mutually exclusive, I suppose we can focus on the former (console output) here, and defer any decision about the latter. WDYT? I'm up for updating the PR accordingly if you agree.

Yes, absolutely :) I don't think this needs to block on designing the other idea about file output.

Sep 14 '23 15:09 cushon

Tnx for the input @cushon! I'll give some thought to the flag names. Planned to have a look at this PR this weekend, but it didn't happen. Will hopefully circle back to it somewhere in the coming days. :crossed_fingers:

Sep 17 '23 19:09 Stephan202

error-prone error-prone copied to clipboard

In verbose mode, log Error Prone performance details

error-prone
error-prone copied to clipboard