google-java-format Different formatting of block line comments with openjdk 23+37-2369

Applies to: 1.23.0

I came across this oddity - this file from Apache Lucene: SandboxFacetsExample.txt

doesn't need any changes with jdk17-jdk22:

> java -version
openjdk version "22.0.1" 2024-04-16
OpenJDK Runtime Environment Temurin-22.0.1+8 (build 22.0.1+8)
OpenJDK 64-Bit Server VM Temurin-22.0.1+8 (build 22.0.1+8, mixed mode, sharing)
> java -jar google-java-format-1.23.0-all-deps.jar -n SandboxFacetsExample.java

but will result in reformatting under jdk 23 (ea, 23+37-2369):

> java -version
openjdk version "23" 2024-09-17
OpenJDK Runtime Environment (build 23+37-2369)
OpenJDK 64-Bit Server VM (build 23+37-2369, mixed mode, sharing)

>java -jar google-java-format-1.23.0-all-deps.jar -n SandboxFacetsExample.java
SandboxFacetsExample.java

Fully reproducible on Windows and Linux. The diff is:

--- a/SandboxFacetsExample.java
+++ b/SandboxFacetsExample.java_
@@ -149,7 +149,8 @@ public class SandboxFacetsExample {
         new FacetFieldCollectorManager<>(defaultTaxoCutter, defaultRecorder);

     //// (2.1) if we need to collect data using multiple different collectors, e.g. taxonomy and
-    ////       ranges, or even two taxonomy facets that use different Category List Field, we can
+    ////       ranges, or even two taxonomy facets that use different Category List Field, we
+    // can
     ////       use MultiCollectorManager, e.g.:
     // TODO: add a demo for it.
     // TaxonomyFacetsCutter publishDateCutter = new

Aug 29 '24 18:08 dweiss

I toyed a bit with debugging this and there's a difference in how these line comments appear to JavaCommentsHelper.wrapLineComments. With JDK21, it receives that comment line-by-line. With JDK23, it receives a concatenation of all lines, with lines 2 and on having a whitespace prefix:

java21:

////       ranges, or even two taxonomy facets that use different Category List Field, we can

java23:

//// (2.1) if we need to collect data using multiple different collectors, e.g. taxonomy and
    ////       ranges, or even two taxonomy facets that use different Category List Field, we can
    ////       use MultiCollectorManager, e.g.:

this triggers the difference because the split condition now sees different line length for those subsequent lines.

while (line.length() + column0 > Formatter.MAX_LINE_LENGTH) {
...

Hope this helps somehow.

Aug 29 '24 19:08 dweiss

For what it's worth, core tests pass when I do this:

diff --git a/core/src/main/java/com/google/googlejavaformat/java/JavaCommentsHelper.java b/core/src/main/java/com/google/googlejavaformat/java/JavaCommentsHelper.java
index d34ecc4..d54b231 100644
--- a/core/src/main/java/com/google/googlejavaformat/java/JavaCommentsHelper.java
+++ b/core/src/main/java/com/google/googlejavaformat/java/JavaCommentsHelper.java
@@ -49,7 +49,11 @@ public final class JavaCommentsHelper implements CommentsHelper {
     List<String> lines = new ArrayList<>();
     Iterator<String> it = Newlines.lineIterator(text);
     while (it.hasNext()) {
-      lines.add(CharMatcher.whitespace().trimTrailingFrom(it.next()));
+      if (tok.isSlashSlashComment()) {
+        lines.add(CharMatcher.whitespace().trimFrom(it.next()));
+      } else {
+        lines.add(CharMatcher.whitespace().trimTrailingFrom(it.next()));
+      }
     }
     if (tok.isSlashSlashComment()) {
       return indentLineComments(lines, column0);

but I'm not sure whether this is the right fix. Perhaps it'd be good to find out why the token text is different with jdk 23 (as it's the primary cause of the problem).

Aug 29 '24 19:08 dweiss

Thanks for the bug and the investigation! Presumably the difference in JDK 23 is due to the new support for markdown doc comments.

Sep 10 '24 15:09 cushon

I think you're right - that's a spot-on observation. Interestingly, this comment appears inline with the code, not as a javadoc of anything in particular [1]. Could be a regression in the comment parser worth reporting to openjdk.

[1] https://github.com/apache/lucene/blob/304d4e7855deb39b4650d954d027ce8697873056/lucene/demo/src/java/org/apache/lucene/demo/facet/SandboxFacetsExample.java#L151-L153

Sep 10 '24 19:09 dweiss

I think it's probably a deliberate change on the javac side, to be able to process the entire /// comment as a single token instead of multiple line comments. Your fix of removing the leading and trailing whitespace seems OK to me, the formatter will add any necessary leading whitespace back as part of indentLineComments, it seems reasonable to remove the leading whitespace to fix the line length computation.

Are you interested in sending a PR?

Sep 11 '24 22:09 cushon

Thank you for adding the regression test. I've created a PR with the basic workaround I suggested, hope it helps.

Sep 12 '24 08:09 dweiss