lucene icon indicating copy to clipboard operation
lucene copied to clipboard

Use @snippet javadoc tag for snippets

Open jpountz opened this issue 10 months ago • 31 comments

Description

Now that main requires Java 21, we can start using the @snippet javadoc tag, which is quite more convenient to use than the <pre class="prettyprint"></pre> HTML tags we are currently using.

One problem I noticed is that gradlew tidy messes up with formatting when @snippet tags are used.

jpountz avatar Feb 18 '25 22:02 jpountz

https://github.com/google/google-java-format/issues/789 https://github.com/google/google-java-format/issues/886

Unfortunately, I think these make this issue a no-go to unless you want to resign from using google java formatter... which I don't think we should do.

dweiss avatar Feb 18 '25 22:02 dweiss

Related: if we bump main to java 23+ (https://github.com/apache/lucene/issues/14229) then we could do snippets in javadoc with the typical markdown mechanisms.

rmuir avatar Feb 18 '25 23:02 rmuir

The markdown would have a huge advantage for the many analyzers with xml-style prettyprint docs today. I don't think @snippet works with languages other than java.

For me, editor displays the current "prettyprint" of analyzers XML docs without any syntax highlighting:

Image

On the other hand, with markdown you'd get this:

<fieldType name="text_arnormal" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.ArabicNormalizationFilterFactory"/>
  </analyzer>
</fieldType>

rmuir avatar Feb 19 '25 00:02 rmuir

@dweiss what do you think about exploring the option to swap out spotless java.GoogleJavaFormatStep with java.EclipseJdtFormatterStep but using google-java format config: https://github.com/google/styleguide/blob/gh-pages/eclipse-java-google-style.xml

I know the history that google says nobody else's formatter is good enough, they say you need "their" formatter for the google style and that others eclipse/intellij can't do it right, but their formatter can't support @snippet nor markdown comments (it will mess them up), so they fail technically, they don't keep up with java standard.

rmuir avatar Mar 25 '25 01:03 rmuir

The Palantir one is no better off from what I can tell, it is a fork of the google one with some cosmetic tweaks. I know eclipse JDT doesn't choke on these constructs, although I don't have much experience with their formatter, their compiler has been pretty good and they are generally responsive about new java features.

rmuir avatar Mar 25 '25 01:03 rmuir

I have been using google's tool mostly because it's been fairly solid in terms of formatting between releases (so no surprise changes on updates). I also like (or have slowly grown accustomed to) what they do with the indents and more complex code blocks. There are quirks, sure, but they'll be present with any tool.

The beauty of having a single standard should be that the tool used to format the code is irrelevant as long, as it ends up with the same output. If we can make eclipse do the same thing (or even better, format snippets reliably) then it's a transparent change for the better - I don't have a problem with that at all.

dweiss avatar Mar 25 '25 05:03 dweiss

It's close but there are many differences. See the commit above and try running it on one of the modules, for example suggest (./gradlew -p lucene/suggest spotlessApply).

Maybe it can be made closer, I haven't tried.

dweiss avatar Mar 25 '25 06:03 dweiss

@dweiss thank you so much for that starter commit for evaluation. I will try it tonight and fire up eclipse and see what our options are.

I finally finished parser (https://github.com/rmuir/tree-sitter-javadoc) and am now looking at options to (ab)use it, to convert our docs to markdown automatically, when I discovered that google-java-format will mess up markdown comments, e.g. too-long /// will wrap to another line with // and break everything.

So currently, there is no way to escape from the prettier because neither @snippet nor markdown can work with the formatter, so no way to make a patch that passes build.

rmuir avatar Mar 25 '25 11:03 rmuir

Nice!

So... google java format has this option, at least in the cmd line version: Image

If nothing else works, we could just make a multipass and format javadocs using a different tool than the rest of the code... Or fork gjf and implement proper javadoc formatting, which should be a fun project to work on.

dweiss avatar Mar 25 '25 12:03 dweiss

I think in the markdown case, the bug I saw was that it didn't treat /// as javadoc but as an ordinary inline comment. But I can experiment with the option still.

rmuir avatar Mar 25 '25 12:03 rmuir

https://github.com/google/google-java-format/issues/1193

Disabling Javadoc formatting doesn't prevent either issue.

So it seems it's broken entirely. Argh.

dweiss avatar Mar 25 '25 13:03 dweiss

I played with this a bit and reduced noise in two ways:

Original file:

113 files changed, 3656 insertions(+), 5216 deletions(-)

  1. Disable reformatting of Apache License header:
<setting id="org.eclipse.jdt.core.formatter.comment.format_header" value="false"/>

93 files changed, 2525 insertions(+), 3861 deletions(-)

  1. Disable reformatting of javadocs comments We could fight it, but my goal is to move them all to markdown format which google doesn't support anyway. Most differences are in things such as indentation of html tags. So the differences are just not relevant and create noise when doing comparison:
<setting id="org.eclipse.jdt.core.formatter.comment.format_javadoc_comments" value="false"/>

75 files changed, 1922 insertions(+), 3380 deletions(-)

Eclipse has a lot of options and I think this file is just out of date. With the unnecessary noise out of the way, it is a matter of exploring all the settings and trying to reduce the differences more. I'm guessing there would always be differences, but it would be nice to minimize them.

rmuir avatar Mar 26 '25 12:03 rmuir

For the record those diffstats were based on ./gradlew -p lucene/suggest spotlessApply and include the changes of the patch/formatter XML itself

rmuir avatar Mar 26 '25 12:03 rmuir

@dweiss I also wonder, with an "autoformat" workflow, if we even care so much.

I don't understand what is so sacrosanct about google's format: to me it is ugly. Snippet tag is from java 18 (6 releases back) and google doesn't care, they are a big corporation and probably the type to keep code on e.g. java 8. I don't think we should weigh their opinions very much on anything.

All autoformatters lead to ugliness at times, it is just the tradeoff you make to avoid hassles, and still reap the benefits of avoiding formatting bikesheds, noise in PRs, etc.

I just think autoformat the code in a consistent way, call it a day.

rmuir avatar Mar 26 '25 12:03 rmuir

I just think autoformat the code in a consistent way, call it a day.

I agree, it does not matter which one you pick if it's an automated process.

I don't understand what is so sacrosanct about google's format: to me it is ugly.

My initial choice of gjf was motivated purely on subjective experience - I just liked the way it formatted code, that's it (I still do). Perhaps one more contributing factor was that I didn't want to play with a gazillion of Eclipse's options... this also facilitates the discussions concerning "which settings are best"... it's what it is, end of discussion. :)

But I don't take it personally, really. I myself filed one or two issues with them and there is definitely corporate inertia in changing things. So if switching to Eclipse lets us make advancements (like with markdown javadocs), let's just do it. I'm fine with it.

dweiss avatar Mar 26 '25 13:03 dweiss

We'd probably have to apply reformatting to 10x and main to keep cherry picking easier. Other than that - it's a simple thing to do.

dweiss avatar Mar 26 '25 13:03 dweiss

I will play with the "don't reformat javadoc option". Maybe it's an easier solution to these problems? If we can coerce Google formatter to treat /// as javadoc then problem solved. Markdown is sensitive to indentation etc so it needs to not mess with it.

rmuir avatar Mar 26 '25 13:03 rmuir

I've toyed with it a bit but I don't see a way for it to not break those /// comments. An alternative is to fork it, fix what we need and then use the forked version from spotless. This is a doable alternative to using Eclipse's formatter - I really don't mind either.

dweiss avatar Mar 26 '25 16:03 dweiss

https://github.com/google/google-java-format/blob/master/core/src/main/java/com/google/googlejavaformat/java/JavaCommentsHelper.java#L46-L60

All it takes would be to preserve any formatting in markdown comments (the /// lines) - a dumb check above would do the trick, I think.

dweiss avatar Mar 26 '25 16:03 dweiss

@dweiss I'm wondering if we could send them a PR such that any /// line comment respects the --skip-javadoc-formatting flag (or some other flag to say "dont mess around"). it would at least allow projects to move forward with markdown javadoc comments, while still using google-java-format.

rmuir avatar Mar 26 '25 16:03 rmuir

Yeah. I'll take a look at that, interesting. Part of the problem is that different Java versions seem to be returning a different tokenization of those comment strings. Seems like something has changed even from this issue that I filed.

https://github.com/google/google-java-format/issues/1153

because when I run it now on this input:

/// In the following indented code block, `@Override` is an annotation, and not a JavaDoc tag. You must not wrap this line. Even though it's long.
///
///     @Override
///     public void m() ...
///
/// Likewise, in the following fenced code block, `@Override` is an annotation,
/// and not a JavaDoc tag:
///
/// ```
/// @Override
/// public void m() ...
/// ```
class Example {
    public static void main(String... args) {
    	/// Foo, bar baz.
        System.out.println("Hello World!");
    }
}

it breaks the first long line incorrectly:

/// In the following indented code block, `@Override` is an annotation, and not a JavaDoc tag. You
// must not wrap this line. Even though it's long.

I'll take a look in the evening. I don't think it should touch those /// lines at all.

dweiss avatar Mar 26 '25 17:03 dweiss

@dweiss I think that is because google-java-format uses internal JDK compiler apis to parse it. just like error prone. it is why you have to add all the opens?

rmuir avatar Mar 26 '25 17:03 rmuir

Yes, that's correct - https://github.com/google/google-java-format/issues/1153#issuecomment-2344790653

dweiss avatar Mar 26 '25 17:03 dweiss

Here is what I did.

  • added a brute-force non-formatting to any /// line comments in my fork of google-java-format [1]
  • added a local, precompiled binary of the above to my fork of Lucene's repository and modified spotless to use it. [2][3]

It seems to work (I've modified one of the classes, intentionally leaving a super-long line there). This is just a PoC that it's doable... I'm not sure if this patch would be accepted to google-java-format as is (it makes an assumption that anything starting with /// should just be left alone). I also don't think we can store a binary blob of google java format in Lucene repository.

I can try to initiate a discussion with google-java-format folks first. If this doesn't work, I can publish my fork under my own coordinates to Maven Central - then we won't need the local binary blob and things should just work.

[1] https://github.com/google/google-java-format/compare/master...dweiss:google-java-format:preserve-markdown-like-lines [2] https://github.com/apache/lucene/compare/main...dweiss:lucene:gjf-markdown-friendly?expand=1 [3] https://github.com/dweiss/lucene/tree/gjf-markdown-friendly

dweiss avatar Mar 26 '25 20:03 dweiss

@dweiss very nice. the /// can have leading whitespace in front of it which is preserved too. I dont know how their parser works but you can simulate the leading-case by adding a method, e.g. by adding comment to one of the members of your FacetFieldCollector in your branch

Its this case:

/// long comment...
public class foo {
  /// another long comment...
  public void bar(){}
}

rmuir avatar Mar 26 '25 20:03 rmuir

It'll work with those indented lines as well - it actually will align block indentation to the column they should be starting at. So:

    /// [Collector] for cutter+recorder pair. NOCOMMIT: This is a long line comment that should not be broken into smaller chunks.
    ///   * also, the space after /// should be preserved
    ///   * like here.
  public FacetFieldCollector(FacetCutter facetCutter, FacetRecorder facetRecorder) {
    this.facetCutter = facetCutter;
    this.facetRecorder = facetRecorder;
  }

ends up this after running tidy:

  /// [Collector] for cutter+recorder pair. NOCOMMIT: This is a long line comment that should not be broken into smaller chunks.
  ///   * also, the space after /// should be preserved
  ///   * like here.
  public FacetFieldCollector(FacetCutter facetCutter, FacetRecorder facetRecorder) {
    this.facetCutter = facetCutter;
    this.facetRecorder = facetRecorder;
  }

dweiss avatar Mar 27 '25 07:03 dweiss

awesome! I really hope the patch is accepted: this will definitely give us a path forward.

rmuir avatar Mar 27 '25 12:03 rmuir

Let's wait a few days and see if there's any feedback. Like I mentioned above, what we use for formatting/checking format adherence shouldn't really matter for anybody who uses gradlew tidy in Lucene - gcf is an ASL licensed software and if we find anything problematic, I'll just publish my own fork and we can switch to that (and then we have the freedom of doing whatever, including applying markdown formatting of our own... like https://github.com/commonmark/commonmark-java).

dweiss avatar Mar 27 '25 12:03 dweiss

yeah: only thing I will say about choice of formatter is that normally I try to configure the editor in the repo to match what the build expects. e.g. for languages like python and typescript, it means disabling all formatting-related diagnostics and enabling "format-on-save" and "organize-imports-on-save", so that things "just work". It makes for easy dev experience where users have less friction with CI.

But with java, such a setup seems difficult to achieve: I'm not sure anyone even has that today.

rmuir avatar Mar 27 '25 12:03 rmuir