lintr
lintr copied to clipboard
duplicate_argument_linter should skip on mutate() calls
dplyr::mutate()
allows sequential updates to variables, e.g.
x %>%
dplyr::mutate(
col = gsub("t", "", col),
col = gsub("\\s+$", "xxx", col)
)
but duplicate_argument_linter()
sees col
used twice and throws.
Not throwing the 3.0.0 milestone quite yet...
mutate
and transmute
(?) could be added to the default exclude list.
OTOH I don't like this style and, at least on older versions of dbplyr
, that backend actually didn't work with this kind of code, since reusing columns in SQL isn't possible. So I'm slightly in favor of linting this by default.
WDYT?
The problem IMO is every mutate()
copies the table again which is quite wasteful in general. For dbplyr
, it's not as wasteful since those backends are typically lazy.
We have a ConsecutiveMutateLinter()
for precisely this, and it attempts a carveout for dbplyr
by looking for library(dbplyr)
in the script.
Maybe we should have @hadley or @DavisVaughan weigh in here about dplyr
best practices.
I personally would convert the above to use pipe:
x |>
dplyr::mutate(
col = col |> str_replace("t", "") |> str_replace("\\s+$", "xxx")
)
But I don't thinking mutating a variable multiple times is an anti-pattern. dplyr copies the column involved, so it's not quite as expensive as you might think.
OK, looks like an allowlist is the way to go then. not tagging for 3.0.0 release either.