tools
tools copied to clipboard
📎 Prettier Compatibility Metric
Description
Rome's goal is that our formatting matches Prettier's formatting closely. However, it's currently difficult to know if a PR is improving the compatibility or is making things worse.
Goal
Define a Prettier compatibility metric and provide means to compute the metric using the current Rome version.
Proposal
Percentage of lines that match Prettier's formatting, similar to git's similarity index
compatibility_per_file = matching_lines / MAX(lines_file_1, lines_file_2)
compatibility = SUM(matching_lines) / SUM(MAX(lines_file1, lines_file2))
I'm not very proficient at math and the metric might be flawed. Please feel free to propose other metrics.
Code Pointers
Our test runner already has an option to generate a Report by setting the REPORT_PRETTIER
env variable.
https://github.com/rome/tools/blob/db61c4b3a9b3e09811333346ed6d32019bc47dfc/crates/rome_js_formatter/tests/prettier_tests.rs#L270-L297
It should be straightforward to
- Compute the metric for every file and include it in the report
- Compute the metric across all files and print it at the top of the report.
Stretch
- Create a CI Job similar to the parser conformance that computes the overall metric on the PR branch and on
main
and comments the two numbers together with the differencePR - main
(percentage the PR improved the compatibility) - Include a link to the full report (should be possible to commit the report as a gist) in the comment
- Ultimate solution: Compare the metric for every file and report the count of files for which the metric: a) didn't change, b) increased, c): decreased. List the names of the files for which the metric increased/decreased
I would like to have a try.
I would like to have a try.
Awesome. I assigned you the issue. Ping me if something is unclear or if you need some pointers. I recommend approaching this problem step by step (PR by PR) and built out the tools first and test them as CLI before approaching the CI.
I propose another formula to calculate the prettier compatibility,
compatibility_per_file = matching_lines / MAX(lines_file_1, lines_file_2)
compatibility = Sum(compatibility_per_file) / number_of_files
There is no conclusion to say which one is better in any case, it dependends. for example: Assume we have three files A, B, C
match_lines: A: 10000 B: 50 C: 60
MAX(lines_file_1, lines_file_2) A: 10010 B: 100 C: 100
We use the same formula to calculate compatibility_per_file, so we got:
compatibility_a: 0.999
compatibility_b: 0.5
compatibility_a: 0.6
the result of the first compatibility formula: compatibility: 0.9902
the result of my compatibility formula: compatibility: 0.69
if someone resolves all the compatible issues of file b, compatibility_per_file:
compatibility_a: 0.999
compatibility_b: 1
compatibility_a: 0.6
the result of the first compatibility formula: compatibility: 0.995
the result of my compatibility formula: compatibility: 0.863
the compatibility diff of the first formula: 0.990 -> 0.995 the compatibility diff of the second formula: 0.69 -> 0.863
I prefer to call the formula that @MichaReiser is line based
and the second formula is file based
.
My understanding is that you're proposing an alternative metric for the overall compatibility but keep the same metric for a single file.
The file based metric calculates the average of the per file compatibilities, whereas the "line based" metric tries to measure how many lines in total are similar. That's why I would call these Similarity (line based) and Avg similarity (file based)
.
In my view, both of these provide valuable signal and I would recommend implementing both to see which one works better to track our work. What do you think?
agree
File Based Average Prettier Similarity:
compatibility_per_file = matching_lines / MAX(lines_file_1, lines_file_2)
file_based_average_prettier_similarity = Sum(compatibility_per_file) / number_of_files
Line Based Average Prettier Similarity
compatibility_per_file = matching_lines / MAX(lines_file_1, lines_file_2)
line_based_average_prettier_similarity = SUM(matching_lines) / SUM(MAX(lines_file1, lines_file2))
I saw this PR was merged: https://github.com/rome/tools/pull/2574
What's missing now?
The metric is merged but what would be nice to have is a CI job that comments with the current metric and compares it with main (ideally, per file).
I am still working on CI
I have some concerns about a numerical metric. Testing out Rome on some small projects, I've noticed that there are large diffs that occur from very small changes like trailing commas. On the flip side, there are some changes that are small in line diffs, but produce formatted output that, at least in my view, is harder to read and less appealing visually.
I don't want to discourage a numerical metric, but I do think we should take more stuff into consideration when thinking about compatibility.
I have some concerns about a numerical metric. Testing out Rome on some small projects, I've noticed that there are large diffs that occur from very small changes like trailing commas. On the flip side, there are some changes that are small in line diffs, but produce formatted output that, at least in my view, is harder to read and less appealing visually.
I don't want to discourage a numerical metric, but I do think we should take more stuff into consideration when thinking about compatibility.
This metric metric is a tool that helps us approximate the prettier compatibility. It isn't an exact representation. Nevertheless, it helps us to measure if we are moving in the right direction and have a rough understanding on how close we are. However, it doesn't mean that our ultimate goal is to reach 100% and that we should optimize for it at any cost. That would be a misuse of the metric.
Regarding trailing comma. We should make sure that we compare apples with apples, meaning, we should apply the same formatting options.
This issue is stale because it has been open 14 days with no activity.