lintr
lintr copied to clipboard
Use file-level where possible for faster computation
There are only two linters incompatible with file-level lints (as evidenced by the hacky PR failures here):
- cyclocomp_linter
- spaces_left_parentheses_linter
All other linters could compute on the single file-level source expression, for potentially huge gains by avoiding function calls, loops, appends, ...
What needs to be solved is how to cache and retrieve lints in this run mode.
Once that's done, we can add a new attribute (max_level?) to Linter() that signals lint() that the linter can handle parallel linting of all expression-level source expressions.
WDYT about the idea? Do you have any ideas on the cache part? I'm especially interested in the scenario where a cache entry is available for most, but not all individual expressions.
Some thoughts: get_lints() will have to be broken up into an execution planning phase
- peek cache for all (linter, expr) combinations
- schedule available entries for cache retrieval
- schedule cache misses for (parallel) linting
and a run phase
- retrieve everything cached from the cache
- run parallel linting for linters that support it and cache their results (expr-level)
- run sequential linting for all other linters and cache their results
- combine lints from steps 1-3
Not bad so far, but probably still some room for improvement:
devtools::load_all()
system.time( lint_package())
# here:
# user system elapsed
# 89.799 0.460 90.362
# main:
# user system elapsed
# 104.109 0.473 104.654
Codecov Report
Attention: 24 lines in your changes are missing coverage. Please review.
Comparison is base (
4b59aac) 98.53% compared to head (9c7db92) 98.13%.
:exclamation: Current head 9c7db92 differs from pull request most recent head f5e633d. Consider uploading reports for the commit f5e633d to get more accurate results
| Files | Patch % | Lines |
|---|---|---|
| R/lint.R | 83.21% | 23 Missing :warning: |
| R/source_utils.R | 75.00% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #2449 +/- ##
==========================================
- Coverage 98.53% 98.13% -0.41%
==========================================
Files 126 126
Lines 5676 5799 +123
==========================================
+ Hits 5593 5691 +98
- Misses 83 108 +25
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
user system elapsed
53.661 0.446 54.201
Now just have to fix all the unexpected problems and add some tests for the newly introduced branches :)
Caching seems broken, but it's hard to reproduce locally for some reason. Trying to figure it out.
Seeing the time for linting roughly halved without caching at least shows there is merit in this approach 🥲
Caching seems broken, but it's hard to reproduce locally for some reason. Trying to figure it out.
Seeing the time for linting roughly halved without caching at least shows there is merit in this approach 🥲
indeed looks amazing! I also wonder if we should add an option to get_source_expressions() to skip building the expr-level objects. this after https://github.com/Rdatatable/data.table/issues/5830#issuecomment-1858168068 where a massive file spends the vast majority of compute time on this step.
anyway, I'm thinking it's prudent to save this PR for after release -- messing around with the caching seems like a minefield for hard-to-catch bugs. would be good to let this hang around in dev for longer to see what bubbles up.
I had thought about lazily building them, but it's not useful as long as not all linters support "batches", because they are needed at least once anyway.
Maybe there is a more performant way to create the objects under the assumption that the tree is read-only.