lintr icon indicating copy to clipboard operation
lintr copied to clipboard

Use file-level where possible for faster computation

Open AshesITR opened this issue 1 year ago • 7 comments

There are only two linters incompatible with file-level lints (as evidenced by the hacky PR failures here):

  • cyclocomp_linter
  • spaces_left_parentheses_linter

All other linters could compute on the single file-level source expression, for potentially huge gains by avoiding function calls, loops, appends, ...

What needs to be solved is how to cache and retrieve lints in this run mode. Once that's done, we can add a new attribute (max_level?) to Linter() that signals lint() that the linter can handle parallel linting of all expression-level source expressions.

WDYT about the idea? Do you have any ideas on the cache part? I'm especially interested in the scenario where a cache entry is available for most, but not all individual expressions.

AshesITR avatar Dec 15 '23 06:12 AshesITR

Some thoughts: get_lints() will have to be broken up into an execution planning phase

  1. peek cache for all (linter, expr) combinations
  2. schedule available entries for cache retrieval
  3. schedule cache misses for (parallel) linting

and a run phase

  1. retrieve everything cached from the cache
  2. run parallel linting for linters that support it and cache their results (expr-level)
  3. run sequential linting for all other linters and cache their results
  4. combine lints from steps 1-3

AshesITR avatar Dec 15 '23 06:12 AshesITR

Not bad so far, but probably still some room for improvement:

devtools::load_all()
system.time( lint_package())
# here:
#   user  system elapsed 
# 89.799   0.460  90.362 
# main:
#    user  system elapsed 
# 104.109   0.473 104.654 

AshesITR avatar Dec 15 '23 23:12 AshesITR

Codecov Report

Attention: 24 lines in your changes are missing coverage. Please review.

Comparison is base (4b59aac) 98.53% compared to head (9c7db92) 98.13%.

:exclamation: Current head 9c7db92 differs from pull request most recent head f5e633d. Consider uploading reports for the commit f5e633d to get more accurate results

Files Patch % Lines
R/lint.R 83.21% 23 Missing :warning:
R/source_utils.R 75.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2449      +/-   ##
==========================================
- Coverage   98.53%   98.13%   -0.41%     
==========================================
  Files         126      126              
  Lines        5676     5799     +123     
==========================================
+ Hits         5593     5691      +98     
- Misses         83      108      +25     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Dec 15 '23 23:12 codecov-commenter

   user  system elapsed 
 53.661   0.446  54.201 

Now just have to fix all the unexpected problems and add some tests for the newly introduced branches :)

AshesITR avatar Dec 16 '23 00:12 AshesITR

Caching seems broken, but it's hard to reproduce locally for some reason. Trying to figure it out.

Seeing the time for linting roughly halved without caching at least shows there is merit in this approach 🥲

AshesITR avatar Dec 16 '23 01:12 AshesITR

Caching seems broken, but it's hard to reproduce locally for some reason. Trying to figure it out.

Seeing the time for linting roughly halved without caching at least shows there is merit in this approach 🥲

indeed looks amazing! I also wonder if we should add an option to get_source_expressions() to skip building the expr-level objects. this after https://github.com/Rdatatable/data.table/issues/5830#issuecomment-1858168068 where a massive file spends the vast majority of compute time on this step.

anyway, I'm thinking it's prudent to save this PR for after release -- messing around with the caching seems like a minefield for hard-to-catch bugs. would be good to let this hang around in dev for longer to see what bubbles up.

MichaelChirico avatar Dec 16 '23 01:12 MichaelChirico

I had thought about lazily building them, but it's not useful as long as not all linters support "batches", because they are needed at least once anyway.

Maybe there is a more performant way to create the objects under the assumption that the tree is read-only.

AshesITR avatar Dec 16 '23 17:12 AshesITR