lintr Turn off line length linter within string literals (optionally?)

We've got a very common scenario where string literals are used to contain big SQL queries.

Crucially, the style guide for SQL & R differ in that the former's max line width is 100, the latter 80.

Moreover, to skip only one such line we've got to somewhat awkwardly inject the nolint tag inside the SQL:

query <- "
  SELECT SUM(CASE WHEN some_column > 5 THEN 5 ELSE some_column END) AS output_column -- # nolint
"

It seems to me the best solution here is to simply ignore string literal "bodies", perhaps with an option.

Happy to implement this if others agree.

Sep 13 '21 16:09 MichaelChirico

Not sure if we should, but we have a similar problem with long machine-readable comments, so it would be nice to also control linting of comment lines.

What should qualify a line for exclusion in both cases? E.g. what if a string / comment starts past the max line length?

Sep 13 '21 18:09 AshesITR

by "bodies" i meant to exclude the first and last line, e.g.

x <- " # included for linting
abcdef # excluded from linting
ghijkl # excluded from linting
"      # included for linting

So it seems like we could add two options to the linter:

(1) exclude_comments = FALSE (by default) (2) exclude_string_bodies = FALSE (by default)

Sep 14 '21 00:09 MichaelChirico

Giving this a bit more thought, one use case with the string line length issue is with very long paths, where adding newlines to the string is not an option. Therefor I feel we should exclude the string expression completely, i.e. only lint line_length if new code starts past the end of a long string or is too long in the line before the string starts.

To clarify what I mean:

# lints:
read_csv("very_long_path.csv", col_types = cols(.default = col_character()))
read_csv(col_types = cols(very_long_cols_declaration), file = "short_string")

# not lints:
read_csv("very_long_path.csv")
read_csv(
  file = "very_long_path.csv",
  col_types = cols(
    very_long_cols_declaration
  )
)

IINM this behavior could be achieved by treating every string literal as length 0

Sep 28 '21 18:09 AshesITR

We do have loads of long file names, but usually it's because of the leading directories, and that can be solved using file.path().

OTOH, we also have a lot of SQL table names / nested column structures accessed with . that more awkward to split using paste0 or paste(sep = '.').

Sep 28 '21 19:09 MichaelChirico

Related: for me the place where the line_length always bites me is with urls. Not flagging a line that's just a single string with no spaces (or maybe detecting urls specifically?) would be a nice feature.

I often find myself doing something like:

combined_sd <- function(mean, sd, n) {
  # based on
  # https://math.stackexchange.com/questions/2971315/how-do-i-combine-standard-deviations-of-two-groups # nolint
  ...
}

Oct 15 '21 16:10 d-sci

NB for Stackexchange specifically, you can shorten the URL drastically:

https://math.stackexchange.com/questions/2971315/how-do-i-combine-standard-deviations-of-two-groups

Is equivalent to

https://math.stackexchange.com/q/2971315

The no spaces heuristic breaks down for long paths that contain embedded spaces. That said, it seems very hard to figure out a good heuristic which would neither cause many false-positives nor too many false negatives.

Oct 15 '21 20:10 AshesITR

# not lints:
read_csv("very_long_path.csv")
read_csv(
  file = "very_long_path.csv",
  col_types = cols(
    very_long_cols_declaration
  )
)
IINM this behavior could be achieved by treating every string literal as length 0

Hmm, I might lint in the first case, requiring the string literal to be "alone" on the line to be exempted:

read_csv(
  "very_long_path.csv"
)

"alone" because the latter case, where the string is the target of an EQ_SUB, would be OK since forcing this is somewhat awkward:

read_csv(
  file =
    "very_long_path.csv"
)

Note to self that we'll also have to exclude the OP-COMMA on the same line, if present.

Aug 04 '23 06:08 MichaelChirico

Related: for me the place where the line_length always bites me is with urls. Not flagging a line that's just a single string with no spaces (or maybe detecting urls specifically?) would be a nice feature.

I often find myself doing something like:
combined_sd <- function(mean, sd, n) {
  # based on
  # https://math.stackexchange.com/questions/2971315/how-do-i-combine-standard-deviations-of-two-groups # nolint
  ...
}

@d-sci ICYMI you can now (in dev & soon on CRAN) switch to # nolint next if that's better for you:

# nolint next: line_length_linter.
# https://...

Sep 18 '23 20:09 MichaelChirico

lintr lintr copied to clipboard

Turn off line length linter within string literals (optionally?)

lintr
lintr copied to clipboard