styler icon indicating copy to clipboard operation
styler copied to clipboard

Detect alignment

Open krlmlr opened this issue 8 years ago • 6 comments
trafficstars

UPDATE: Alignment detection was implemented for function calls: https://styler.r-lib.org/articles/detect-alignment.html

  • [x] multiple named arguments in function calls (https://github.com/r-lib/testthat/pull/663#discussion_r146549784)
  • [ ] multiple unnamed arguments in single-line function calls (https://github.com/r-lib/testthat/pull/663/commits/0f61d06bf0e3bd382831ffbe6b2bf8f13cff0bbc#diff-3f2ca9f70d2f3617419c503141c9e6fbR6)
  • [ ] comments (https://github.com/r-lib/testthat/pull/663#discussion_r146549784)
  • [ ] ~ in function calls like case_when() and friends.

Rough outline:

  • Look at the horizontal position of the corresponding items in the original code
  • If the majority is aligned and the alignment involves >1 space somewhere, we store this somewhere in the nested parse data
  • When rendering, the alignment is restored

krlmlr avatar Oct 25 '17 06:10 krlmlr

Related: #317

lorenzwalthert avatar Jan 06 '18 14:01 lorenzwalthert

May want to check overlap with https://github.com/seasmith/AlignAssign and learn from it.

lorenzwalthert avatar Jun 17 '18 23:06 lorenzwalthert

I think we should take a different path. I see two important cases now:

Multi-line function calls with named arguments

They should be classified as aligned if argument values are left-aligned, argument values are right-aligned and commas match position.

call(
  x   = 1, kdd  =  2,
  xy  = 2, n    = 33,
)

We first detect if the arguments are aligned. This is the case when they match the above layout. We can implement this as follows:

  • argument name and = have the same line1 (i.e. the start of a token) for multiple lines for the first column (justified below).
  • spacing around comma is correct (none before, at least one after).
  • and at least one space around =.

These checks, however, would classify the following as aligned

call(
  x   = 1, kdd  =  2,
  xy  = 22, n    = 33,
)

Because the argument values are not in the same nest as the argument name and =, checking their value is expensive, as it is potentially again a complicated expression, not just a terminal token, which would require constructing text recursively to know its length. Hence, for a first pass, we can assume that a call is aligned if the argument name and = are aligned plus the other criteria. Later, we could check for that and if it holds (which will not be the case for 90% of the cases we look at now), we can do the expensive checks and hence not imposing any slowdown for 90% of the cases.

Multi-line function calls with unnamed arguments

They should be classified as aligned if the first column is left-aligned (to respect indention rules), the remaining columns are right-aligned and all commas but those in the first column match position.

call(
  1111, 2,
  2,   33,
)

This would involve a similar procedure, namely:

  • first column has same number of lag spaces.
  • spacing around comma is correct (none before, > 0 after).
  • commas from all columns but the first one are aligned (needs construction of text because of nested expressions).

Mixed function calls

call(
  x = 2, 4, c  =      f(a, b),
  1,    44, de = ffk(a, b, c)
)

Hence, in the most general form:

  • lag spaces of column 1 must agree.
  • spacing around comma (0 before, > 1 after) and spacing around = (at least one around).
  • all positions of commas of col > 2 must agree (needs recursive creation of `text?).

Checks must be performed for any multi-line function call (with or without =), which is quite expensive, in particular, the recursive creation of text. I think as we already group by line, we should greedily check column by column and within that, we should check one line at the time and fail fast. E.g. we don't need to check line 3, column 1 if we know that line 1 and 2 are not aligned for column 1.

call(1,
  another = 3,
  text = f(a, b = 33, x == 3)
)

The mixed case is the general case and hence, the pure other cases were only developed here because we can resolve them in a unified manner (a one-stop solution).

It is not clear what happens with multi-line calls that have multi-line calls within them, but arguably the rules should not depend on that:

call(
  first, second___ = call(
    x = 3, a = f(g), ggh
  ), third_row_of_first_column, 
  col1., col2_l    = 9
  )

Assignment

a  <- 3
ab <- 3

This is a case I have not looked at because it was not so common. We can have a look later. What do you think? cc: @pat-s

Comments

Not looked at

ab # here
a  # here

lorenzwalthert avatar May 11 '19 09:05 lorenzwalthert

Sounds good. Would fit good into the "strict" idea of styler then. There might be even more problems to tackle for R6 notation. Supporting custom alignment of dictionaries is an important one and a good one to start with :) (I guess it applies more often to R6 than to S3)

pat-s avatar May 12 '19 09:05 pat-s

WIP at https://github.com/lorenzwalthert/styler/tree/detect-alignment.

lorenzwalthert avatar Jul 07 '19 12:07 lorenzwalthert

@pat-s @krlmlr are you interested in reviewing the vignette on alignment detection? https://styler.r-lib.org/articles/detect-alignment.html? It was hard for me to write it such that it's easy to understand, yet most of it should be pretty intuitive. But I feel the text is complicated. It would be nice if we could get that right before styler v1.2.0.

lorenzwalthert avatar Oct 13 '19 20:10 lorenzwalthert