lintr icon indicating copy to clipboard operation
lintr copied to clipboard

Upstreaming linters from Google's internal linting suite

Open MichaelChirico opened this issue 3 years ago • 11 comments

Internally at Google we use lintr to power our in-editor and CI-based R linting tools. Over the last year+ (in addition to working directly on lintr) we have been building out a set of linters that extends the suite offered by lintr. Currently, the package has >60 such linters, almost all of which we think are widely relevant. They have all been test on & applied to existing internal R code at Google.

We are happy to contribute any subset of these to lintr. Here is a brief summary of all of them; I can elaborate more as needed:

  • [x] AnyDuplicatedLinter - Require usage of anyDuplicated() > 0 over any(duplicated(.))
  • [x] AnyIsNALinter - Require usage of anyNA over any(is.na(.))
  • [x] ApplyAnonymousLinter - Block usage of anonymous functions in *apply functions when unnecessary #1541
  • [ ] BaseOverwriteLinter - Block assigning any variables whose name clashes with a 'base' R function
  • [x] BooleanArithmeticLinter #1579
  • [ ] CharacterOnlyLinter - Enforce library calls using symbols
  • [x] ClassEqualsLinter - Block comparison of class with ==
  • [ ] ComparisonNegationLinter
  • [ ] ConsecutiveMutateLinter
  • [ ] ConsecutiveSuppressionLinter
  • [x] ElseSameLineLinter - Require else to come on the same line as
  • [x] EmptyAssignmentLinter - Block assignment of {} #1637
  • [x] ExpectComparisonLinter - Require usage of expect_gt(x, y) over expect_true(x > y) (and similar) #955
  • [x] ExpectIdenticalLinter - Require usage of expect_identical(x, y) where appropriate #958
  • [x] ExpectIsLinter - Redirect away from deprecated testthat::expect_is() #946
  • [x] ExpectLengthLinter - Require usage of expect_length(x, n) over expect_equal(length(x), n) #950
  • [x] ExpectNamedLinter - Require usage of expect_named(x, n) over expect_equal(names(x), n) #949
  • [x] ExpectNotLinter - Require usage of expect_false(.) over expect_true(!.) #945
  • [x] ExpectNullLinter - Require usage of expect_null(x) over expect_equal(x, NULL) and similar #887
  • [x] ExpectS3ClassLinter - Require usage of expect_s3_class() or expect_s4_class() where appropriate. #943
  • [x] ExpectS4ClassLinter - Require usage of expect_s4_class(x, k) over expect_true(is(x, k)) #943
  • [x] ExpectTrueFalseAndConditionLinter - Force && conditions in expect_true(), expect_false() to be written separately #948
  • [x] ExpectTrueFalseLinter - Require usage of expect_true(x) over expect_equal(x, TRUE) #947
  • [x] ExpectTypeLinter - Require usage of expect_type(x, type) over expect_equal(typeof(x), type) #924
  • [ ] FilterAndConditionLinter
  • [x] FixedRegexLinter - Require usage of 'fixed=TRUE' in regular expressions where appropriate #1021
  • [x] ForLoopIndexLinter - Block usage of for loops directly overwriting the indexing variable #1629
  • [x] FunctionBraceLinter - Require multi-line functions to use braces
  • [x] GrepSubsetLinter - Require usage of grep(value = TRUE) over xgrep(x)
  • [x] IfelseCensorLinter - Block usage of ifelse where pmin or pmax is more appropriate
  • [x] IfElseMatchBracesLinter - Require both or neither if/else branches to use curly braces
  • [ ] IfNotElseLinter
  • [ ] IfSwitchLinter - Require usage of switch() over repeated if/else blocks
  • [x] InnerCombineLinter - Require c() to be applied before relatively expensive vectorized functions
  • [ ] InnerComparisonLinter - Require == to be used outside of sapply() when comparing to a constant
  • [x] IsNumericLinter - Redirect is.numeric(x) || is.integer(x) to just use is.numeric(x) #1635
  • [ ] KeywordQuoteLinter - Block unnecessary quoting in calls
  • [ ] LengthLevelsLinter - Require usage of nlevels over length(levels(.))
  • [x] LengthsLinter #1568
  • [ ] ListComparisonLinter
  • [x] LiteralCoercionLinter - Require usage of correctly-typed literals over literal coercions
  • [ ] MagrittrDotLinter #1656
  • [x] MultipleStopifnotLinter - Force consecutive stopifnot() calls into just one
  • [x] NestedIfelseLinter - Block usage of nested ifelse() calls
  • [ ] NestedPipeLinter
  • [ ] NrowSubsetLinter
  • [x] NumericLeadingZeroLinter - Require usage of a leading zero in all fractional numerics
  • [ ] OneCallPipeLinter
  • [x] OuterNegationLinter - Require usage of !any(.) over all(!.), !all(.) over any(!.)
  • [ ] PasteFilePathLinter - Block usage of paste() with sep='/'
  • [x] PasteSepLinter - Block usage of paste() with sep=''
  • [ ] PasteStrrepLinter - Require usage of strrep('.', n) over paste(rep('.', n), collapse = '') #1652
  • [x] PasteToStringLinter - Block usage of paste() with collapse=', '
  • [x] PipeCallLinter - Force explicit calls in magrittr pipes #801
  • [ ] PipeReturnLinter
  • [x] RedundantEqualsLinter #1556
  • [x] RedundantIfelseLinter - Prevent ifelse() from being used to produce TRUE/FALSE
  • [ ] RepLenLinter
  • [x] ReturnAssignmentLinter #1569
  • [ ] RequireNzcharLinter - Require usage of nzchar where appropriate
  • [x] RoutineRegistrationLinter - Identify .Call usage to unregistered routines #1669
  • [ ] SampleIntLinter - Require usage of sample.int(n, m, ...) over sample(1:n, m, ...)
  • [ ] ScalarInLinter - Block usage like x %in% 'a'
  • [ ] StopifnotAllLinter
  • [x] StopifnotAndConditionLinter - Force && conditions in stopifnot() to be written separately
  • [x] StopPasteLinter - Block usage of paste() and paste0() with messaging functions using ...
  • [x] StringsAsFactorsLinter - Identify cases where stringsAsFactors should be supplied explicitly
  • [x] SystemFileLinter - Block usage of file.path() with system.file()
  • [ ] TerminalCloseLinter - Prohibit close() from terminating a function definition
  • [ ] UnnecessaryNestingLinter
  • [x] UnreachableCodeLinter - Block unreachable code and comments following return statements
  • [x] VectorLogicLinter - Enforce usage of scalar && in conditional statements
  • [ ] WhichGreplLinter - Require usage of grep over which(grepl(.))
  • [x] YodaTestLinter - Block obvious 'yoda tests' #957

What do you think is the best way to decide about which are in/out of scope for lintr? Does it make sense to delay this until 3.0 is released, or push to include with the major version bump?

MichaelChirico avatar Nov 10 '21 04:11 MichaelChirico

I think it would be great to include these in the 3.0 release. It would also give people more incentive to upgrade.

It might also be worthwhile to package these as a separate object, so it would be easy to use them all as google_linters like we currently have for default_linters.

As to which of them should be on by default I leave that to others to discuss.

jimhester avatar Nov 10 '21 15:11 jimhester

That sounds really great! I also like the idea of adding google_linters. In the same spirit, should we add all_linters as well? At the moment, non-default linters are somewhat hard to discover. We ended up going through the entire ?linters manpage and look at all linters to decide whether we might want to use it. With all_linters the workflow for deciding on a subset could be to first enable all and then disable those that actually cause unwanted lints in the code.

To decide on the defaults, it will probably help to look at the concrete implementations.

Curious side question regarding ScalarInLinter: That construct is my go-to way to check if x == "a" when x is known to contain NAs. Is there a better, equally efficient, way to do this? Illustrative code:

x <- c(NA, "a", "b", "c", NA)
x == "a" # c(NA, TRUE, FALSE, FALSE, NA)
x %in% "a" # c(FALSE, TRUE, FALSE, FALSE, FALSE) -- much safer for e.g. indexing

AshesITR avatar Nov 11 '21 07:11 AshesITR

no, it's a known false positive of the linter. we decided it's better for readability to treat NA handling explicitly, if so intended, and put that caveat in the lint message:

!is.na(x) & x == "a"

MichaelChirico avatar Nov 11 '21 08:11 MichaelChirico

@AshesITR with #958 we'll finish up the current set of 13 testthat-related linters we have.

I think now is a good time to think if we want to triage the prep for lintr 3.0.0 -- have have O(100) linters ready for upstreaming, which would put us at about June to finish, assuming we keep up the current (quite high) pace of new PRs.

Should we prioritize instead? Pick some set of particularly neat/high-impact linters to include in the 3.0.0 release, then focus on tidying up 3.0.0 and putting it on CRAN before returning to new google linters?

MichaelChirico avatar Mar 18 '22 17:03 MichaelChirico

Sounds good to me. Do you have any linters in mind that fire frequently and could be considered high-impact? We could aim for a 3.0.1 release containing the rest of the google linters.

AshesITR avatar Mar 19 '22 08:03 AshesITR

@MichaelChirico I edited your OP with a checklist and linked all PRs / issues I was aware of. Are there new linters to be upstreamed or is the list complete?

AshesITR avatar Mar 19 '22 10:03 AshesITR

Hello! I am interested in applying the Google R Style Guide to one of our packages. Thus, I am thrilled to see that you consider supplying a lintr::google_linters list to the lintr package. Is this list in planning and will it be added to the package in the near future?

LukasDSauer avatar Apr 26 '22 08:04 LukasDSauer

there are a slew already added -- see #962

MichaelChirico avatar Apr 26 '22 08:04 MichaelChirico

Hi Michael and thank you for the quick reply. As far as I understand, #962 contains a list of Google linters already added to lintr. Jim Hester suggested above that available Google linters could be packaged in a separate object:

It might also be worthwhile to package these as a separate object, so it would be easy to use them all as google_linters like we currently have for default_linters.

Has this already been done or would I have to include all of the linters in #962 one-by-one?

LukasDSauer avatar Apr 26 '22 08:04 LukasDSauer

The new tag facility could be used for this. You would then be able to use linters_with_tags(tags = "google"). Note though that some (default) linters are configured differently from the google style guide, so it may be necessary to provide a google_linters object for full compatibility, @MichaelChirico WDYT?

AshesITR avatar Apr 26 '22 11:04 AshesITR

About Bioconductor's style mentioned in the blog (and with an open issue at #43), there is already a package which provides an approximation to the Bioconductor style using lintr. But how would that work?

Should this follow the broom/tidymodels strategy to provide some basics and leave the different methods, styles in this case, to different packages or should all be "centralized" and organized here?

llrs avatar Jul 15 '22 23:07 llrs

With #2321, that's it!

Special shout-out to @AshesITR and @IndrajeetPatil for your time and insights during the review process! The suite is now much better thanks to you! Thanks also to others who have filed follow-ups/tested the new linters on your own projects.

MichaelChirico avatar Dec 06 '23 05:12 MichaelChirico