styler icon indicating copy to clipboard operation
styler copied to clipboard

Enforcing a 80-character line length rule?

Open jarodmeng opened this issue 6 years ago • 30 comments

Is it possible to enforce a rule to limit code to 80 character per line? styler currently doesn't seem to do that. The closest issue I can find is #118, but it's not clear if that option is available.

jarodmeng avatar Oct 22 '17 15:10 jarodmeng

Thanks for having consulted open issues first. #118 is an edge case as far as indention goes and not quite what you are referring to. The task of splitting lines of code with more than 80 characters is non-trivial as it is not a-priori clear at which position lines should be split. I think this may also not follow a general and abstract rule (as the other styling rules) but is best done manually and decided on a case-to-case basis. Therefore, this feature is currently no supported in styler and will probably no be so in the near future. We suggest to use the packages lintr to detect lines with more than 80 characters and rearrange manually.

Edit: I think we aim to support this in the long run.

lorenzwalthert avatar Oct 22 '17 19:10 lorenzwalthert

I'm working on (finally!) releasing utf8, this means we could release styler soon. After this I think we could start thinking about that 80-character rule, because editing code from "wide" to "long" takes quite a lot of time (measured in seconds, but still). I'd like to suggest to do it at the level of function calls and arithmetic expressions first:

  • If a single-line function call hits the 80-character boundary, add a line break after the opening paren, and proceed with usual rules.
  • If an arithmetic expression is too wide, add line breaks after lowest-prio operators (i.e., the + in 1 * 2 + 3 * 4)

Maybe start with only function calls -- if we add a set of rules after all other rules (because we need to be sure about the width), and then do only reindention of function call arguments?

krlmlr avatar Oct 23 '17 08:10 krlmlr

Thanks for the reply! I agree that it's not easy to implement such a rule. @krlmlr 's suggestion looks good to me.

jarodmeng avatar Oct 23 '17 17:10 jarodmeng

So do you @krlmlr suggest to take the 80 character limit from the (raw) unstyled expression, right?

lorenzwalthert avatar Oct 23 '17 22:10 lorenzwalthert

No, that would be after styling, when we know the final width -- we'd just add linebreaks but not change much otherwise.

krlmlr avatar Oct 24 '17 07:10 krlmlr

Issues that can't be resolved soonish will be closed and labelled "Status: Postponed".

lorenzwalthert avatar Nov 03 '17 23:11 lorenzwalthert

Reference: #65.

lorenzwalthert avatar Feb 18 '18 21:02 lorenzwalthert

@lorenzwalthert , I've implemented a version of this, by adding a loop to parse_transform_serialize to do the following:

  1. Run through the formatting and flattening process to obtain flattened_pd.
  2. See if any of the tokens have an end column greater than the limit. If any do, note the first token with this property and add its pos_id to a list. If none do, render the text and exit.
  3. Prepend a formatter (I call it add_line_break_at_pos_id constructed by mk_add_line_break_at_pos_id) to the line_break formatters. This formatter increments lag_newlines for the elements which either have one of the stored pos_ids or have a child that does.
  4. Repeat from Step 1 with the additional formatter, until exit.

I've committed the implementation to a fork (4017b39a9c). It's not pull-request-ready, and I would appreciate your input as to whether it's worth developing into something that is.

krivit avatar Apr 09 '18 09:04 krivit

Thanks @krivit for looking into it. To be honest, I first need to think about what I had in mind earlier this year about implementing that feature. I think there were some obstacles I can't quite remember as I was reflecting on different approaches. As it looks to me, your approach seems to work (maybe need to handle cases where indention >= 80 so line break won't help at all and we will be stuck in infinite loop) but I am not sure if there is a better way of doing it. Anyways, I first have to fully understand what you do and maybe also giving it a fresh try myself - at least conceptually, so we can figure out the best way of implementing that feature. As soon as we have reached a high-level consensus about how to go about it, I think we can start moving your branch towards that if you are willing to.

I could give you some feedback on your code right away but I think I first want to think about the larger picture before we optimise the details.

Also, I would like to only implement line breaks for function calls first as @krlmlr suggested in his comment.

Is that ok for you?

lorenzwalthert avatar Apr 10 '18 18:04 lorenzwalthert

Thanks for reopening the issue. The plan sounds fine. The way I see it, any implementation will be inherently inelegant, for two reasons:

  • Every line break inserted will change how the rest of that line is indented.
  • Inserting a line break can turn a one-line expression into a multiline expression, which follows different rules.

So, any algorithm will be some form of trial-and-error.

Here are some additional issues I can think of, in addition to those you had mentioned:

  • It grabs the maximum width from getOption("width"), but is that what we want?
  • I haven't tested it, but I'm pretty sure that it's willing to break a line before a comma. This could be handled by listing the types of tokens that should not have a break before them, and breaking before the token before them.
  • It can't insert line breaks within a comment or a string literal. This means that a comment or a string that's too long is going to cause an infinite loop as well. That said, this can be addressed simply by not counting any token that is immediately preceded by a line break as "overflowing".
  • Changing a call from single-line to multiline changes the indentation preceding the overflowing token, which may change which token would have overflowed.

krivit avatar Apr 10 '18 23:04 krivit

@krivit can you open a PR anyways already with the WIP so we don't forget about it?

lorenzwalthert avatar Apr 24 '18 11:04 lorenzwalthert

@lorenzwalthert, done.

krivit avatar Apr 24 '18 12:04 krivit

Guys what's the status of this? Are you disabling lintr long lines checks when using styler and lintr together?

edmondop avatar Nov 29 '18 22:11 edmondop

I'd say the status is that there is an implementation referenced above that works in many cases but we have not had the time to think through it and make to work in general. So also there was a merge conflict so we did not rebase and quite a few changes from upstream are not contained in this branch.

lorenzwalthert avatar Nov 29 '18 23:11 lorenzwalthert

Have we ever thought about using base::deparse() to do the splitting? If not, this might be worth considering.

lorenzwalthert avatar Jan 18 '19 19:01 lorenzwalthert

Yes. base::deparse() does not enforce a maximum line length: it adds a line break after the first token to exceed width.cutoff, not before.

I had submitted a wishlist item to R's Bugzilla asking for this functionality a while ago (https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17390), but they are unlikely to implement it, unless someone can write a concise C implementation.

Another possible use of deparse() is to try different width.cutoffs until the actual maximum width is below the specified value. However, this is suboptimal as well, especially if the lengths of tokens vary within and between expressions.

krivit avatar Jan 18 '19 19:01 krivit

Ok, thanks. I now remember. But good to have this in that thread too. Also: base::deparse() can't be used here I think because it drops comments, a thing that causes problems with {formatR}.

lorenzwalthert avatar Jan 18 '19 19:01 lorenzwalthert

I think break lines in a uniform way for a long line will be better.

black formatter for python run in such way.

Before:

ggplot(flow_df, aes(y = Freq, x = time, alluvium = leaf_id, stratum = progenitor)) + geom_lode() + geom_flow() + geom_stratum() + geom_text(stat = "stratum", label.strata = TRUE) + facet_grid(fish ~ cell_type)

Good:

ggplot(
  flow_df, aes(y = Freq, x = time, alluvium = leaf_id, stratum = progenitor)
) + geom_lode() + geom_flow() + geom_stratum() +
  geom_text(
    stat = "stratum",
    label.strata = TRUE
  ) + facet_grid(fish ~ cell_type)

Best:

ggplot(
  flow_df, aes(y = Freq, x = time, alluvium = leaf_id, stratum = progenitor)
) +
  geom_lode() +
  geom_flow() +
  geom_stratum() +
  geom_text(stat = "stratum", label.strata = TRUE) +
  facet_grid(fish ~ cell_type)

y9c avatar Apr 22 '19 02:04 y9c

@yech1990 sorry I can't follow. Can you make elaborate on this a bit more?

lorenzwalthert avatar Apr 22 '19 14:04 lorenzwalthert

@lorenzwalthert I can't find enough of R language formatter implementation to illustrate this problem. You can do the formattting test of the following python code with two different tools.

The input: (The first line of the code exceed 80 characters)

if ((a > 2) and (b > 3) and (c > 4) and (d > 5) and (e > 6) and (f > 7) and (g > 8) and (h > 9)):
    print("Yes!")

formatted by yapf

if ((a > 2) and (b > 3) and (c > 4) and (d > 5) and (e > 6) and (f > 7)
        and (g > 8) and (h > 9)):
    print("Yes!")

formatted by black

if (
    (a > 2)
    and (b > 3)
    and (c > 4)
    and (d > 5)
    and (e > 6)
    and (f > 7)
    and (g > 8)
    and (h > 9)
):
    print("Yes!")

Both Implementation solve the problem.

  • The first tool break the long line in the furthest 'breakable' position within the limit. formatR, which use the base::deparse() method, do the formatting in this way.
  • The second tool breaks the line by 'grammar unite' (that is a judgment in this example). Break long line in the second way make the code nicer and much more readable. Meanwhile, it is helpful for editing the code, since you you delete a whole line (dd in vim) quickly for editing the conditions.

Thus, I wish styler can format ggplot2 or dplyr one line express in the smarter way. It is very common to remove and add new operations (eg, + geom_bar()) in writing ggplot2/dplyr code.

y9c avatar Apr 22 '19 17:04 y9c

Ok, now I understand better, thanks. Since this is going to slow down styler, there must be a way to turn it off globally. I.e. one should be able to turn width = Inf.

I think to begin with, we could have a look at function calls (ggplot + calls are resolved with #544). I think they are 90% of the cases that are longer than 80 characters. If the input function call is longer than 80 characters, we could simply put each argument on a separate line. Also, we should probably not consider removing line breaks, just adding. We must make sure styling remains ideompetent, otherwise this will potentially lead to some troubles with caching. Also, it's a problem that line breaks at a character width depend on the spacing. And spacing comes after line breaks in the transformer order because indention is a spacing topic and depends on line breaks.

Reference: https://github.com/gadenbuie/grkstyle/blob/master/R/style.R#L50

Edit: This might have simplified with #705, where we factored indention out of space transformers. Indention formatting is now potentially completely independent of space formatting.

Edit: we should think of other styler functionally that will help us speeding things up. Maybe stylerignore could be used as some sort of cache? Or base indention and then only style the expression that needs re-arrangement?

lorenzwalthert avatar Apr 22 '19 17:04 lorenzwalthert

Hi folks, just wondering what the status is on this issue? It would be a helpful addition :-)

Thanks.

RobertMyles avatar Jun 30 '20 10:06 RobertMyles

I am not currently working on that. We first need to outline conceptually what the implementation (or a first version) will look like. I think what I outlined in https://github.com/r-lib/styler/issues/247#issuecomment-485481338 is a good start.

lorenzwalthert avatar Jun 30 '20 13:06 lorenzwalthert

Ok, thanks!

RobertMyles avatar Jun 30 '20 14:06 RobertMyles

But I admit, the growing number of 👍 in the issue is motivating 😄

lorenzwalthert avatar Jun 30 '20 15:06 lorenzwalthert

Hi guys, one year after the last question... :wink: is there any chance that this would be implemented in the near future? I've been using styler quite actively for a couple of months now and I think this is the only major missing piece.

euklid321 avatar Jul 16 '21 09:07 euklid321

I know this is high up on the wish list for many users but I anticipate this to be quite complex and time consuming to implement. Given other priorities in my life as well as in {styler}, I won't build this soon unfortunately. I hope you can understand this.

lorenzwalthert avatar Sep 11 '21 20:09 lorenzwalthert

For knitting PDF documents Carlos Luis Rivera posted a very useful answer on my StackOverflow post, which might be helpful for some people. It requires including the following to your header:

---
title: "R Notebook"
output: pdf_document
header-includes:
  - |
    ```{=latex}
    \usepackage{fvextra}
    \DefineVerbatimEnvironment{Highlighting}{Verbatim}{
      breaksymbolleft={}, 
      showspaces = false,
      showtabs = false,
      breaklines,
      commandchars=\\\{\}
    }
    ```
---

```{r}
knitr::opts_chunk$set(tidy="styler")
```

moritzpschwarz avatar Oct 04 '21 18:10 moritzpschwarz

I use styler to format R code. Lintr gives me warning on lines on 80 characters line limit. However it seems styler has no option to specify max number of characters per line. I'd rather enforce the line width automatically instead of manually. How do R users enforce line width limit in Rstudio or in vim?

❯ Rscript -e "styler::style_file(commandArgs(trailingOnly = TRUE),indent_by=4)" script.R

ssh352 avatar Jan 15 '24 01:01 ssh352

Hi everyone, I opened the issue #1192, and I just realized it's a duplicate of this.

Should I close my issue, or not?

Thanks.

maikol-solis avatar Apr 20 '24 13:04 maikol-solis