tools icon indicating copy to clipboard operation
tools copied to clipboard

📎 Prettier Compatibility Metric

Open MichaReiser opened this issue 2 years ago • 11 comments

Description

Rome's goal is that our formatting matches Prettier's formatting closely. However, it's currently difficult to know if a PR is improving the compatibility or is making things worse.

Goal

Define a Prettier compatibility metric and provide means to compute the metric using the current Rome version.

Proposal

Percentage of lines that match Prettier's formatting, similar to git's similarity index

compatibility_per_file = matching_lines / MAX(lines_file_1, lines_file_2)
compatibility = SUM(matching_lines) / SUM(MAX(lines_file1, lines_file2))

I'm not very proficient at math and the metric might be flawed. Please feel free to propose other metrics.

Code Pointers

Our test runner already has an option to generate a Report by setting the REPORT_PRETTIER env variable.

https://github.com/rome/tools/blob/db61c4b3a9b3e09811333346ed6d32019bc47dfc/crates/rome_js_formatter/tests/prettier_tests.rs#L270-L297

It should be straightforward to

  • Compute the metric for every file and include it in the report
  • Compute the metric across all files and print it at the top of the report.

Stretch

  • Create a CI Job similar to the parser conformance that computes the overall metric on the PR branch and on main and comments the two numbers together with the difference PR - main (percentage the PR improved the compatibility)
  • Include a link to the full report (should be possible to commit the report as a gist) in the comment
  • Ultimate solution: Compare the metric for every file and report the count of files for which the metric: a) didn't change, b) increased, c): decreased. List the names of the files for which the metric increased/decreased

MichaReiser avatar May 07 '22 10:05 MichaReiser

I would like to have a try.

IWANABETHATGUY avatar May 10 '22 10:05 IWANABETHATGUY

I would like to have a try.

Awesome. I assigned you the issue. Ping me if something is unclear or if you need some pointers. I recommend approaching this problem step by step (PR by PR) and built out the tools first and test them as CLI before approaching the CI.

MichaReiser avatar May 10 '22 11:05 MichaReiser

I propose another formula to calculate the prettier compatibility,

compatibility_per_file = matching_lines / MAX(lines_file_1, lines_file_2)
compatibility = Sum(compatibility_per_file) / number_of_files

There is no conclusion to say which one is better in any case, it dependends. for example: Assume we have three files A, B, C


match_lines: A: 10000 B: 50 C: 60


MAX(lines_file_1, lines_file_2) A: 10010 B: 100 C: 100


We use the same formula to calculate compatibility_per_file, so we got:

compatibility_a:  0.999
compatibility_b:  0.5
compatibility_a:  0.6

the result of the first compatibility formula: compatibility: 0.9902

the result of my compatibility formula: compatibility: 0.69

if someone resolves all the compatible issues of file b, compatibility_per_file:

compatibility_a:  0.999
compatibility_b:  1
compatibility_a:  0.6

the result of the first compatibility formula: compatibility: 0.995

the result of my compatibility formula: compatibility: 0.863

the compatibility diff of the first formula: 0.990 -> 0.995 the compatibility diff of the second formula: 0.69 -> 0.863

I prefer to call the formula that @MichaReiser is line based and the second formula is file based.

IWANABETHATGUY avatar May 11 '22 05:05 IWANABETHATGUY

My understanding is that you're proposing an alternative metric for the overall compatibility but keep the same metric for a single file.

The file based metric calculates the average of the per file compatibilities, whereas the "line based" metric tries to measure how many lines in total are similar. That's why I would call these Similarity (line based) and Avg similarity (file based).

In my view, both of these provide valuable signal and I would recommend implementing both to see which one works better to track our work. What do you think?

MichaReiser avatar May 11 '22 06:05 MichaReiser

agree

IWANABETHATGUY avatar May 11 '22 06:05 IWANABETHATGUY

File Based Average Prettier Similarity:

compatibility_per_file = matching_lines / MAX(lines_file_1, lines_file_2)
file_based_average_prettier_similarity = Sum(compatibility_per_file) / number_of_files

Line Based Average Prettier Similarity

compatibility_per_file = matching_lines / MAX(lines_file_1, lines_file_2)
line_based_average_prettier_similarity = SUM(matching_lines) / SUM(MAX(lines_file1, lines_file2))

IWANABETHATGUY avatar May 12 '22 09:05 IWANABETHATGUY

I saw this PR was merged: https://github.com/rome/tools/pull/2574

What's missing now?

ematipico avatar May 23 '22 08:05 ematipico

The metric is merged but what would be nice to have is a CI job that comments with the current metric and compares it with main (ideally, per file).

MichaReiser avatar May 23 '22 08:05 MichaReiser

I am still working on CI

IWANABETHATGUY avatar May 23 '22 08:05 IWANABETHATGUY

I have some concerns about a numerical metric. Testing out Rome on some small projects, I've noticed that there are large diffs that occur from very small changes like trailing commas. On the flip side, there are some changes that are small in line diffs, but produce formatted output that, at least in my view, is harder to read and less appealing visually.

I don't want to discourage a numerical metric, but I do think we should take more stuff into consideration when thinking about compatibility.

NicholasLYang avatar May 26 '22 22:05 NicholasLYang

I have some concerns about a numerical metric. Testing out Rome on some small projects, I've noticed that there are large diffs that occur from very small changes like trailing commas. On the flip side, there are some changes that are small in line diffs, but produce formatted output that, at least in my view, is harder to read and less appealing visually.

I don't want to discourage a numerical metric, but I do think we should take more stuff into consideration when thinking about compatibility.

This metric metric is a tool that helps us approximate the prettier compatibility. It isn't an exact representation. Nevertheless, it helps us to measure if we are moving in the right direction and have a rough understanding on how close we are. However, it doesn't mean that our ultimate goal is to reach 100% and that we should optimize for it at any cost. That would be a misuse of the metric.

Regarding trailing comma. We should make sure that we compare apples with apples, meaning, we should apply the same formatting options.

MichaReiser avatar May 27 '22 07:05 MichaReiser

This issue is stale because it has been open 14 days with no activity.

github-actions[bot] avatar Sep 27 '22 12:09 github-actions[bot]