[Discussion] Serialization of Error Prone diagnostics
A question in three parts:
- Is there any option/flag for serializing Error Prone diagnostics (i.e. the Description object) into some machine parsable format (i.e. some standard like SARIF or even a custom JSON), in addition to emitting compiler errors/warnings?
- If that option doesn't exist, has it ever been previously discussed?
- Assuming it isn't already there somewhere, is this something that would be desirable upstream? Any ideas on what it might look like?
I might well be missing previous discussions, but I tried to search the repo and the docs and I didn't find anything of the sort.
The context around the ask is that we are looking for ways to surface the diagnostics from non-blocking checks (i.e. SUGGESTION and for some configurations WARNING) both inside an IDE and also during code review. We can potentially parse the output of javac, but that means over-emitting stuff (again, see SUGGESTION level checks) and also seems more fragile than just somehow getting the results serialized from Error Prone, with potentially more detailed location information.
cc: @msridhar @cpovirk @raviagarwal7
CC @rickie and @chamil-prabodha; this question relates to the discussion in #1474; in particular https://github.com/google/error-prone/pull/1474#issuecomment-1370633206 :)
Fwiw, I feel like this should not be in ErrorProne but in JavaC, as ErrorProne uses the JavaC diagnostics API, and I'd also like such a file to include Xlint warnings and other compilation errors.
This issue is related https://github.com/google/error-prone/issues/444. It's fair to say that there is interest in having something for this based on the number of votes there.
As @Stephan202 mentioned, this comment contains a rough idea on how collecting relevant diagnostics could look like: https://github.com/google/error-prone/pull/1474#issuecomment-1370633206.
Fwiw, I feel like this should not be in ErrorProne but in JavaC, as ErrorProne uses the JavaC diagnostics API, and I'd also like such a file to include Xlint warnings and other compilation errors.
I agree adding the serialization directly to the Java compiler would be ideal. I'm not sure how easy it would be to upstream such a contribution though. Also, I would expect it to be significantly easier to update Error Prone than to update the version of the JDK used for building (probably one would have to update to the latest JDK version to get the JavaC support). So I think it's still worth investigating whether such support in Error Prone could be added.
An alternative would be to have a DiagnosticListener-based library that build tools calling javax.tools (Maven, Gradle, etc.) could integrate. It would take more time to integrate there than in ErrorProne, but likely less than in the JDK, and would be backward-compatible with older JDKs.
An alternative would be to have a DiagnosticListener-based library that build tools calling javax.tools (Maven, Gradle, etc.) could integrate. It would take more time to integrate there than in ErrorProne, but likely less than in the JDK, and would be backward-compatible with older JDKs.
Interesting! Could this be done as a javac plugin, like Error Prone itself? Then maybe it could be incorporated without even modifying the build tools.
I've just made a proof of concept here: https://github.com/tbroyer/javac-diagnostics-serializer
It's a javac plugin that uses internal APIs (like ErrorProne) to setup a Log.DiagnosticHandler such that diagnostics are still printed to the standard output (and it won't mess up with setups that might use diagnostic listeners). It's accompanied by a gradle plugin to setup the appropriate options so all you have to do is apply the plugin and add a dependency to the javac plugin (I'll add a default dependency soon; I'll also a Maven integration test). It requires JDK 9 at a minimum (maybe 11, I haven't actually tested with 9) because it depends on a TaskEvent.Kind.COMPILATION event to close the output stream.
For now, as a PoC, it just prints diagnostics in plain text with <filename>:<line>: <kind>: <message> format.
Now, should an actual such plugin emit SARIF? Checkstyle XML? (tools like reviewdog support the latter but not the former) Should it allow pluggable reporters?
@tbroyer this looks amazing! So cool you were able to get the proof of concept to work!
In terms of output format SARIF seems to be getting some broad support, but I don't personally have a strong preference.
Hmm, looking more closely at SARIF, ErrorProne, and JavaC, there's a lot of lost information between an ErrorProne's Description, which quite closely matches SARIF, and the emitted JavaC diagnostics.
Either ErrorProne would have to somehow cooperate with such a tool to expose its information unfiltered (a diagnostic contains a message bundle key and its arguments, for ErrorProne the key is fixed and argument is a formatted message as a string; change it to an object with the formatting done in its toString() and the DiagnosticHandler could get the information out of that object), or file-reporting should be built into ErrorProne (with an additional DiagnosticHandler to capture JavaC's own messages)
@cushon @cpovirk WDYT?
I'm not the most informed on all this, but my understanding is that, while we do track the normal diagnostics issued by javac, our "advanced" integrations of Error Prone (for code review and so forth) integrate into DescriptionListener, which exposes the Description with all its fixes, etc. (You can see how JavacErrorDescriptionListener reduces that all down to the javac diagnostic, IIUC.)
I don't immediately see an obvious way to plug in your own DescriptionListener from a normal javac run. I want to say that most of our tools invoke javac internally, so they can plug in whatever DescriptionListener they want.
Again, any of that may be wrong. And I certainly don't have a vision for what sorts of configurability we might want to provide.
I thought that maybe patching mode would offer some tips (since clearly it has access to the details of any suggested fixes), but it looks like its own special mode, not some way to sneak a custom listener in.
Here's a PoC of the kind of cooperation I hinted above:
- on ErrorProne's side: https://github.com/google/error-prone/compare/master...tbroyer:error-prone:javac-diagnostics-serializer
- on the diagnostic handler's side: https://github.com/tbroyer/javac-diagnostics-serializer/blob/eade50d26181d158eee7f129e6f3188d4795e5f9/javac-plugin/src/main/java/net/ltgt/javacdiagnosticsserializer/SerializingDiagnosticHandler.java#L52-L85
Oh, thanks, that clarifies your toString() comment for me. Not to say that it was unclear, just that I'm more of a person who throws Error Prone checks over the wall than someone familiar with the architecture :)
Given my unfamiliarity with the architecture, I can't confidently do anything more than run our internal tests on your commit and make sure that nothing explodes (which I assume it won't). I'll at least do that now.
I wanted to chime in to say I haven't spent a lot of time considering the details, but I am supportive of Error Prone providing better support for structured diagnostics.
(In our our own internal use we've been getting by with scraping information from build logs with regexes, which is worse but simpler, and now that things like SARIF are getting some traction it might make sense to revisit that.)
@tbroyer your prototype looks encouraging, it seems like an elegant way to enable this with minimal changes and without committing to particular format in Error Prone itself.
If anyone wants to explore what it would like like to have EP take a flag to specific an output file and have it write e.g. SARIF output directly, I'd also be curious what that would look like.
If anyone wants to explore what it would like like to have EP take a flag to specific an output file and have it write e.g. SARIF output directly, I'd also be curious what that would look like.
I spent some time today trying to do such a thing; I'm having a hard time finding how to pass my SarifWriter to the JavacErrorDescriptionListener. While it could be called from any other DescriptionListener, JavacErrorDescriptionListener already computes the AppliedFixes though so it would probably be ideal to use that rather than re-compute those fixes. This code still needs a lot of work: starting with testing.
I'd appreciate a first architectural review of the changes, including how well/bad that would play with other Google-internal code.
I'd appreciate a first architectural review of the changes, including how well/bad that would play with other Google-internal code.
I'm happy to take a look, can you clarify which changes? Do you mean the https://github.com/google/error-prone/compare/master...tbroyer:error-prone:javac-diagnostics-serializer changes you shared earlier, or were there additional changes investigating plumbing SarifWriter into JavacErrorDescriptionListener to share?
Ha ha, looks like I forgot to include the link in my previous comment! That would be https://github.com/google/error-prone/compare/master...tbroyer:error-prone:sarif
I'm curious, is there any progress on this? Did you manage to take a look @cushon 😄?
Thanks for the nudge :)
I spent some time today trying to do such a thing; I'm having a hard time finding how to pass my
SarifWriterto theJavacErrorDescriptionListener. While it could be called from any otherDescriptionListener,JavacErrorDescriptionListeneralready computes theAppliedFixes though so it would probably be ideal to use that rather than re-compute those fixes.
I think JavacErrorDescriptionListener is the only thing that uses DescriptionListener.Factory, and there are only two calls to DescriptionListener.Factory#getDescriptionListener. I think we could probably refactor those to take a Consumer<AppliedFix> or something, and having ErrorProneAnalyzer pass theSarifWriter in when it creates the JavacErrorDescriptionListener. Does that sounds plausible?
Ha, looking back into it, I don't think AppliedFix is really needed actually: a SARIF fix is more or less a direct representation of a Fix, so the approach used in DescriptionBasedDiff should be enough. https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#_Toc34317881
Hi @tbroyer, I'm wondering; do you have plans to work on this again in the coming weeks/months?
The reason I'm asking is that we are still really interested in this feature. Especially in the context of Error Prone Support and its usage at Picnic.
In case you are not planning on spending time on this I want to see if I can get time for this internally. That will require some planning so it's not a given.
Hi @rickie. I was mainly prototyping, with no real intention to ever finish the work actually; so feel free to take where I left it :+1:
For those using Gradle, they're apparently working on surfacing the Java diagnostics to their new "Problems API" that's used in IDEs (https://github.com/gradle/gradle/issues/27596), and someone asked if that Problems API could be used to add inline comments in GitHub Pull Request (https://github.com/gradle/actions/issues/30). So if that's your use case, it might come independently of Error Prone, and you can go vote for those issues.
I think it could also be a good opportunity to work with Gradle here in defining an API for the diagnostics description so Error Prone could pass fix suggestions down to the Problems API consumers (IDEs could integrate "quick fixes" applying those patches, and integrations with other tools like GitHub Pull Requests could add them as suggestions in the comments); similar to how I did it in https://github.com/google/error-prone/issues/3766#issuecomment-1426834718 (code) but without being specific to Error Prone.
I was reminded of this by discussion in #5031 and #444
To try to recap, I think the two options we've discussed here are:
-
Add a flag to Error Prone to cause it to emit structured diagnostics to a file, sort of similar how https://errorprone.info/docs/patching allows writing patches to a file. This should probably use SARIF. This is the approach that https://github.com/google/error-prone/issues/3766#issuecomment-1429253107 started prototyping.
-
Have javac or build tools support writing SARIF, and then it would be possible to get structured diagnostics from Error Prone more or less for free. It might still require some work in EP to coordinate and make some information available, like in https://github.com/google/error-prone/issues/3766#issuecomment-1931812644. Gradle has taken some steps in this direction, I don't know if other build system have. I haven't heard of plans to do this directly in javac.
I think (2) is an appealing long term direction for the ecosystem, but will probably take time. I think doing (1) in the shorter term is still worth considering.