styler icon indicating copy to clipboard operation
styler copied to clipboard

Reverse direction of ->

Open krlmlr opened this issue 7 years ago • 20 comments

# Before:
data %>%
  mutate(...) %>%
  select(...) -> result

# After:
result <-
  data %>%
  mutate(...) %>%
  select(...)

Can we do this easily?

krlmlr avatar Jul 10 '18 09:07 krlmlr

Not sure, have to check. We put all expressions between pipes on the same level, but the assignment is on a different level:

code <- "
data %>%
mutate(...) %>%
select(...) -> result
"
styler:::create_tree(strsplit(code, "\n", fixed = TRUE))
#>                                                  levelName
#> 1  ROOT (token: short_text [lag_newlines/spaces] {pos_id})
#> 2   °--expr:  [0/0] {1}                                   
#> 3       ¦--expr:  [0/0] {3}                               
#> 4       ¦   °--SYMBOL_FUNCTION_CALL: c [0/0] {2}          
#> 5       ¦--'(': ( [0/0] {4}                               
#> 6       ¦--expr:  [0/0] {6}                               
#> 7       ¦   °--STR_CONST: "" [0/0] {5}                    
#> 8       ¦--',': , [0/1] {7}                               
#> 9       ¦--expr:  [0/0] {9}                               
#> 10      ¦   °--STR_CONST: "data [0/0] {8}                 
#> 11      ¦--',': , [0/1] {10}                              
#> 12      ¦--expr:  [0/0] {12}                              
#> 13      ¦   °--STR_CONST: "muta [0/0] {11}                
#> 14      ¦--',': , [0/1] {13}                              
#> 15      ¦--expr:  [0/0] {15}                              
#> 16      ¦   °--STR_CONST: "sele [0/0] {14}                
#> 17      °--')': ) [0/0] {16}

pd <- styler:::compute_parse_data_nested(code)
pd$child[[1]]
#> # A tibble: 3 x 15
#>      id pos_id line1  col1 line2  col2 parent token   terminal text  short
#>   <int>  <int> <int> <int> <int> <int>  <int> <chr>   <lgl>    <chr> <chr>
#> 1    29      2     2     1     4    11     33 expr    FALSE    ""    ""   
#> 2    28     22     4    13     4    14     33 RIGHT_… TRUE     ->    ->   
#> 3    32     24     4    16     4    21     33 expr    FALSE    ""    ""   
#> # ... with 4 more variables: token_before <chr>, token_after <chr>,
#> #   internal <lgl>, child <list>
pd$child[[1]]$child[[1]]
#> # A tibble: 5 x 15
#>      id pos_id line1  col1 line2  col2 parent token   terminal text  short
#>   <int>  <int> <int> <int> <int> <int>  <int> <chr>   <lgl>    <chr> <chr>
#> 1     5      5     2     1     2     4     17 expr    FALSE    ""    ""   
#> 2     4      6     2     6     2     8     17 SPECIA… TRUE     %>%   %>%  
#> 3    15      7     3     1     3    11     17 expr    FALSE    ""    ""   
#> 4    16     14     3    13     3    15     29 SPECIA… TRUE     %>%   %>%  
#> 5    27     15     4     1     4    11     29 expr    FALSE    ""    ""   
#> # ... with 4 more variables: token_before <chr>, token_after <chr>,
#> #   internal <lgl>, child <list>

Created on 2018-07-15 by the reprex package (v0.2.0).

One hurdle is that every token has to have an pos_id for sorting and I don't remember exactly how that worked. Probably possible but not a low-hanging fruit. Do you see this pattern often in code? Probably if styler could reformat it, people would write code like this more because it's more convenient and they know styler is going to reformat.

lorenzwalthert avatar Jul 14 '18 23:07 lorenzwalthert

Thanks. If we just reversed the rows of pd$child[[1]] (except id which we'd leave untouched) and changed RIGHT_... to LEFT_...? Do the nests refer to parent IDs in any way?

It's just a use case, I can change it manually but I thought it might be useful.

krlmlr avatar Jul 14 '18 23:07 krlmlr

Yes, I thought about not touching pos_id. But this might have unexpected consequences, although I am not aware of any. We might just want to try.

lorenzwalthert avatar Jul 14 '18 23:07 lorenzwalthert

FYI: CodeDepends reverses ->. cc @gmbecker.

CodeDepends::getInputs(quote(1 -> x))@code
#> x <- 1

wlandau avatar Aug 18 '18 17:08 wlandau

Would love to take credit for this, but it's the parser doing the work here. There isn't actually a separate -> function/"instruction" within the evaluator:

> parse(text = "5 -> x")
expression(5 -> x)
> parse(text = "5 -> x")[[1]]
x <- 5

See also:

> `<-`
.Primitive("<-")
> `->`
Error: object '->' not found

gmbecker avatar Aug 20 '18 15:08 gmbecker

So could we simply parse / deparse and parse again? Does not seem to have a large speed impact. I.e. styling of this example takes ~100 milliseconds, and we will add 0.5 milliseconds.

code <- "5 -> x; {{{{{{{{x}}}}}}}}"
twice <- quote(styler:::get_parse_data(deparse(parse(text = code)[[1]])))
once <-  quote(styler:::get_parse_data(code))
microbenchmark::microbenchmark(eval(once), eval(twice))
#> Unit: milliseconds
#>         expr      min       lq     mean   median       uq        max neval
#>   eval(once) 1.430916 1.701178 3.506884 2.056408 2.548293 125.718171   100
#>  eval(twice) 1.094703 1.381788 1.853659 1.646930 2.068233   6.485565   100

microbenchmark::microbenchmark(styler::style_text(code))
#> Unit: milliseconds
#>                      expr      min       lq     mean   median       uq       max neval
#>  styler::style_text(code) 86.46711 123.8118 147.6735 151.8006 167.6296  225.5865   100

Created on 2018-09-12 by the reprex package (v0.2.0).

Well no I think it's a bad idea because we said transformations should not be hard-coded, but go into the transformers.

lorenzwalthert avatar Sep 12 '18 12:09 lorenzwalthert

Glad @lorenzwalthert pointed me to this thread again. Allowing the parser to reverse arrows seems to create an edge case.

parse(text = "function(){} -> f")[[1]]
#> function() f <- {
#> }

wlandau avatar Oct 30 '18 01:10 wlandau

Good to know.

lorenzwalthert avatar Oct 30 '18 07:10 lorenzwalthert

It seems that connecting function() to its definition has lower precedence than -> in R's parser:

body(function() {} -> f)
#> f <- {
#> }
body(f <- function() {})
#> {
#> }

Created on 2018-12-14 by the reprex package (v0.2.1.9000)

Anyway, parsing-deparsing-parsing doesn't sound too appealing to me. Let's shelve for now.

krlmlr avatar Dec 14 '18 02:12 krlmlr

Any chance we can get this parser bug fixed in base R? I don't have an account to file a bug report. Alternatively, just check if we have function() {} -> x and not revert anything.

lorenzwalthert avatar Oct 31 '19 08:10 lorenzwalthert

I don't think we need to take any action here: you can't define a function with right assignment.

function() {} -> f
#> function() {} -> f
f
#> Error in eval(expr, envir, enclos): object 'f' not found

ff <- {function() {} -> f}
body(ff)
#> f <- {
#> }

Created on 2019-10-31 by the reprex package (v0.3.0)

krlmlr avatar Oct 31 '19 08:10 krlmlr

For the record I don't think this is a bug in the parser as it is almost certainly intentional. {x <- 5} and {5 -> x} are equivalent so it doesn't really make sense for the evaluator to have different paths for -> and <-.

As for the defining a function thing, I mean... you can if you really want to

> {function() {}} -> f
> f
function() {}
> rm(f)
> (function() {}) -> f
> f
function() {}

I don't really know why you would though.

I'm a bit confused though about the larger context there, though. I guess you guys are transforming raw text, rather than parsed code, and without taking into account what the parser will actually think the code says? If so yeah I guess you'd have to do it yourself

gmbecker avatar Oct 31 '19 22:10 gmbecker

I see a problem when simply reversing lhs <- rhs to rhs -> lhs isn't equivalent for the parser. The function example is an important one, are there others?

styler:::create_tree("f <- function() 5")
#>                                                  levelName
#> 1  ROOT (token: short_text [lag_newlines/spaces] {pos_id})
#> 2   °--expr:  [0/0] {1}                                   
#> 3       ¦--expr:  [0/1] {3}                               
#> 4       ¦   °--SYMBOL: f [0/0] {2}                        
#> 5       ¦--LEFT_ASSIGN: <- [0/1] {4}                      
#> 6       °--expr:  [0/0] {5}                               
#> 7           ¦--FUNCTION: funct [0/0] {6}                  
#> 8           ¦--'(': ( [0/0] {7}                           
#> 9           ¦--')': ) [0/1] {8}                           
#> 10          °--expr:  [0/0] {10}                          
#> 11              °--NUM_CONST: 5 [0/0] {9}
styler:::create_tree("function() 5 -> f")
#>                                                  levelName
#> 1  ROOT (token: short_text [lag_newlines/spaces] {pos_id})
#> 2   °--expr:  [0/0] {1}                                   
#> 3       ¦--FUNCTION: funct [0/0] {2}                      
#> 4       ¦--'(': ( [0/0] {3}                               
#> 5       ¦--')': ) [0/1] {4}                               
#> 6       °--expr:  [0/0] {5}                               
#> 7           ¦--expr:  [0/1] {7}                           
#> 8           ¦   °--NUM_CONST: 5 [0/0] {6}                 
#> 9           ¦--RIGHT_ASSIGN: -> [0/1] {8}                 
#> 10          °--expr:  [0/0] {10}                          
#> 11              °--SYMBOL: f [0/0] {9}

Created on 2019-10-31 by the reprex package (v0.3.0)

krlmlr avatar Oct 31 '19 22:10 krlmlr

Those are equivalent but lhs and rhs aren't what you seem to want them to be.

In fact, at the top level expression, there is no RHS in either case because there is no assignment occuring. the top level expressions are equivalent to

function() { x <-5 }

and

function(){ 5 -> x}

Respectively (I know they're not technically identical, but disregard the brackets since in practice they don't do anything but clarify what the parser will grab in this case). Neither of these expressions contain top level assignment, which means neither of them have a RHS at all.

Now, the first expression within the body of each of those functions IS an assignment, which does have an LHS and RHS, and which flipping the assignment direction and LHS and RHS are fully equivalent.

gmbecker avatar Oct 31 '19 22:10 gmbecker

A bit more specifically to the case you're brought up I don't think there is any way in function() 5 -> x for R to know that it should stop parsing the function body at 5 without <- and -> having pretty radically different precedences, which I think we definitly would not want at all.

We certainly wouldn't want

function() x <- 5

To parse equivalent as

(function() x) <-  5)

gmbecker avatar Oct 31 '19 22:10 gmbecker

I don't really know why you would though.

I'm a bit confused though about the larger context there, though. I guess you guys are transforming raw text, rather than parsed code, and without taking into account what the parser will actually think the code says?

Not sure I can follow. If we just can as.character(parse(text = "5 -> x")), to obtain "x <- 5" (and then style this code), we'd probably prefer that opposed to re-arranging tokens ourself within the nested parse table. It means less complexity on our side plus probably also better performance in terms of speed. What makes you think this is a bad idea.

Only objection from my side for this approach: It's not a transformer function that operates on the parse table, violating one of styler's design principles that if not absolutely necessary, all style rules should be implemented as transformers.

lorenzwalthert avatar Oct 31 '19 23:10 lorenzwalthert

On Thu, Oct 31, 2019 at 4:01 PM Lorenz Walthert [email protected] wrote:

I don't really know why you would though.

I'm a bit confused though about the larger context there, though. I guess you guys are transforming raw text, rather than parsed code, and without taking into account what the parser will actually think the code says?

Not sure I can follow. If we just can as.character(parse(text = "5 -> x")), to obtain "x <- 5" (and then style this code), we'd probably prefer that opposed to re-arranging tokens ourself within the parse table. It means less complexity on our side plus probably also better performance in terms of speed. What makes you think this is a bad idea.

Oh, no I think that is a good idea (honesty I would have assumed you were doing something like that already given no familiarity with the package's inner workings). I was saying I don't know why a code author would want so badly to define a functtion and assign it to it's symbol via -> instead of <-.

The one issue I can think of with round-trip parsing is what it would do to comments. I don't know if you guys retain and handle comments or and I haven't looked at the srcref stuff recently enough to nkow how far you could get for that after parsing, but it is something to keep in mind, I think

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/r-lib/styler/issues/409?email_source=notifications&email_token=AAG53MJZ4VHPCVKH6PYR3S3QRNPTDA5CNFSM4FJEMHP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECZQCVY#issuecomment-548602199, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG53MPHBABIPSKZTUGEQO3QRNPTDANCNFSM4FJEMHPQ .

gmbecker avatar Oct 31 '19 23:10 gmbecker

@gmbecker: I'm comparing f <- function() 5 with function() 5 -> f . These two are parsed differently.

If we only reverse -> towards <- this isn't relevant, but we can't simply reverse <- to -> in all cases.

krlmlr avatar Oct 31 '19 23:10 krlmlr

On Thu, Oct 31, 2019 at 4:08 PM Kirill Müller [email protected] wrote:

@gmbecker https://github.com/gmbecker: I'm comparing f <- function() 5 with function() 5 -> f . These two are parsed differently.

If we only reverse -> towards <- this isn't relevant, but we can't simply reverse <- to -> in all cases.

That's what I'm saying though, it doesn't matter that those look equivalent based on the characters, they parse (and should parse) radically differently. The correct simple reversal of function() 5 ->f is function() f <- 5, NOT f <- function() 5

Put another way, -> is not at the top of the AST in the second example, so reversing it and then placing it at the top of the AST will, of course, give you a very different non-equivalent expression. This is why, as I pointed out in my most recent response to Lorenz, I DO actually think you guys should be round-trip parsing before you do any transformations. Though actually now that I thik about it you're gonna have a bad time with rlang-based expressions if you do that because they violate the rules of the parser. So you'll have to special-case something I guess.

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/r-lib/styler/issues/409?email_source=notifications&email_token=AAG53ML7KWVFZBWAYQGYKWLQRNQOTA5CNFSM4FJEMHP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECZQPWQ#issuecomment-548603866, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG53MPDIJBXLP7COCR626DQRNQOTANCNFSM4FJEMHPQ .

gmbecker avatar Oct 31 '19 23:10 gmbecker

Good point about the comments, I think we can't use the approach with parsing:

as.character(parse(text = "x <- # comment
 3"))
#> [1] "x <- 3"

Created on 2019-11-01 by the reprex package (v0.3.0)

lorenzwalthert avatar Nov 01 '19 06:11 lorenzwalthert