lintr Use file-level where possible for faster computation

There are only two linters incompatible with file-level lints (as evidenced by the hacky PR failures here):

cyclocomp_linter
spaces_left_parentheses_linter

All other linters could compute on the single file-level source expression, for potentially huge gains by avoiding function calls, loops, appends, ...

What needs to be solved is how to cache and retrieve lints in this run mode. Once that's done, we can add a new attribute (max_level?) to Linter() that signals lint() that the linter can handle parallel linting of all expression-level source expressions.

WDYT about the idea? Do you have any ideas on the cache part? I'm especially interested in the scenario where a cache entry is available for most, but not all individual expressions.

Dec 15 '23 06:12 AshesITR

Some thoughts: get_lints() will have to be broken up into an execution planning phase

peek cache for all (linter, expr) combinations
schedule available entries for cache retrieval
schedule cache misses for (parallel) linting

and a run phase

retrieve everything cached from the cache
run parallel linting for linters that support it and cache their results (expr-level)
run sequential linting for all other linters and cache their results
combine lints from steps 1-3

Dec 15 '23 06:12 AshesITR

Not bad so far, but probably still some room for improvement:

devtools::load_all()
system.time( lint_package())
# here:
#   user  system elapsed 
# 89.799   0.460  90.362 
# main:
#    user  system elapsed 
# 104.109   0.473 104.654

Dec 15 '23 23:12 AshesITR

Codecov Report

Attention: 24 lines in your changes are missing coverage. Please review.

Comparison is base (4b59aac) 98.53% compared to head (9c7db92) 98.13%.

:exclamation: Current head 9c7db92 differs from pull request most recent head f5e633d. Consider uploading reports for the commit f5e633d to get more accurate results

Files	Patch %	Lines
R/lint.R	83.21%	23 Missing :warning:
R/source_utils.R	75.00%	1 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2449      +/-   ##
==========================================
- Coverage   98.53%   98.13%   -0.41%     
==========================================
  Files         126      126              
  Lines        5676     5799     +123     
==========================================
+ Hits         5593     5691      +98     
- Misses         83      108      +25

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Dec 15 '23 23:12 codecov-commenter

   user  system elapsed 
 53.661   0.446  54.201

Now just have to fix all the unexpected problems and add some tests for the newly introduced branches :)

Dec 16 '23 00:12 AshesITR

Caching seems broken, but it's hard to reproduce locally for some reason. Trying to figure it out.

Seeing the time for linting roughly halved without caching at least shows there is merit in this approach 🥲

Dec 16 '23 01:12 AshesITR

Caching seems broken, but it's hard to reproduce locally for some reason. Trying to figure it out.

Seeing the time for linting roughly halved without caching at least shows there is merit in this approach 🥲

indeed looks amazing! I also wonder if we should add an option to get_source_expressions() to skip building the expr-level objects. this after https://github.com/Rdatatable/data.table/issues/5830#issuecomment-1858168068 where a massive file spends the vast majority of compute time on this step.

anyway, I'm thinking it's prudent to save this PR for after release -- messing around with the caching seems like a minefield for hard-to-catch bugs. would be good to let this hang around in dev for longer to see what bubbles up.

Dec 16 '23 01:12 MichaelChirico

I had thought about lazily building them, but it's not useful as long as not all linters support "batches", because they are needed at least once anyway.

Maybe there is a more performant way to create the objects under the assumption that the tree is read-only.

Dec 16 '23 17:12 AshesITR

lintr lintr copied to clipboard

Use file-level where possible for faster computation

Codecov Report

lintr
lintr copied to clipboard