styler
styler copied to clipboard
Reverse direction of ->
# Before:
data %>%
mutate(...) %>%
select(...) -> result
# After:
result <-
data %>%
mutate(...) %>%
select(...)
Can we do this easily?
Not sure, have to check. We put all expressions between pipes on the same level, but the assignment is on a different level:
code <- "
data %>%
mutate(...) %>%
select(...) -> result
"
styler:::create_tree(strsplit(code, "\n", fixed = TRUE))
#> levelName
#> 1 ROOT (token: short_text [lag_newlines/spaces] {pos_id})
#> 2 °--expr: [0/0] {1}
#> 3 ¦--expr: [0/0] {3}
#> 4 ¦ °--SYMBOL_FUNCTION_CALL: c [0/0] {2}
#> 5 ¦--'(': ( [0/0] {4}
#> 6 ¦--expr: [0/0] {6}
#> 7 ¦ °--STR_CONST: "" [0/0] {5}
#> 8 ¦--',': , [0/1] {7}
#> 9 ¦--expr: [0/0] {9}
#> 10 ¦ °--STR_CONST: "data [0/0] {8}
#> 11 ¦--',': , [0/1] {10}
#> 12 ¦--expr: [0/0] {12}
#> 13 ¦ °--STR_CONST: "muta [0/0] {11}
#> 14 ¦--',': , [0/1] {13}
#> 15 ¦--expr: [0/0] {15}
#> 16 ¦ °--STR_CONST: "sele [0/0] {14}
#> 17 °--')': ) [0/0] {16}
pd <- styler:::compute_parse_data_nested(code)
pd$child[[1]]
#> # A tibble: 3 x 15
#> id pos_id line1 col1 line2 col2 parent token terminal text short
#> <int> <int> <int> <int> <int> <int> <int> <chr> <lgl> <chr> <chr>
#> 1 29 2 2 1 4 11 33 expr FALSE "" ""
#> 2 28 22 4 13 4 14 33 RIGHT_… TRUE -> ->
#> 3 32 24 4 16 4 21 33 expr FALSE "" ""
#> # ... with 4 more variables: token_before <chr>, token_after <chr>,
#> # internal <lgl>, child <list>
pd$child[[1]]$child[[1]]
#> # A tibble: 5 x 15
#> id pos_id line1 col1 line2 col2 parent token terminal text short
#> <int> <int> <int> <int> <int> <int> <int> <chr> <lgl> <chr> <chr>
#> 1 5 5 2 1 2 4 17 expr FALSE "" ""
#> 2 4 6 2 6 2 8 17 SPECIA… TRUE %>% %>%
#> 3 15 7 3 1 3 11 17 expr FALSE "" ""
#> 4 16 14 3 13 3 15 29 SPECIA… TRUE %>% %>%
#> 5 27 15 4 1 4 11 29 expr FALSE "" ""
#> # ... with 4 more variables: token_before <chr>, token_after <chr>,
#> # internal <lgl>, child <list>
Created on 2018-07-15 by the reprex package (v0.2.0).
One hurdle is that every token has to have an pos_id for sorting and I don't remember exactly how that worked. Probably possible but not a low-hanging fruit. Do you see this pattern often in code? Probably if styler could reformat it, people would write code like this more because it's more convenient and they know styler is going to reformat.
Thanks. If we just reversed the rows of pd$child[[1]] (except id which we'd leave untouched) and changed RIGHT_... to LEFT_...? Do the nests refer to parent IDs in any way?
It's just a use case, I can change it manually but I thought it might be useful.
Yes, I thought about not touching pos_id. But this might have unexpected consequences, although I am not aware of any. We might just want to try.
Would love to take credit for this, but it's the parser doing the work here. There isn't actually a separate -> function/"instruction" within the evaluator:
> parse(text = "5 -> x")
expression(5 -> x)
> parse(text = "5 -> x")[[1]]
x <- 5
See also:
> `<-`
.Primitive("<-")
> `->`
Error: object '->' not found
So could we simply parse / deparse and parse again? Does not seem to have a large speed impact. I.e. styling of this example takes ~100 milliseconds, and we will add 0.5 milliseconds.
code <- "5 -> x; {{{{{{{{x}}}}}}}}"
twice <- quote(styler:::get_parse_data(deparse(parse(text = code)[[1]])))
once <- quote(styler:::get_parse_data(code))
microbenchmark::microbenchmark(eval(once), eval(twice))
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> eval(once) 1.430916 1.701178 3.506884 2.056408 2.548293 125.718171 100
#> eval(twice) 1.094703 1.381788 1.853659 1.646930 2.068233 6.485565 100
microbenchmark::microbenchmark(styler::style_text(code))
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> styler::style_text(code) 86.46711 123.8118 147.6735 151.8006 167.6296 225.5865 100
Created on 2018-09-12 by the reprex package (v0.2.0).
Well no I think it's a bad idea because we said transformations should not be hard-coded, but go into the transformers.
Glad @lorenzwalthert pointed me to this thread again. Allowing the parser to reverse arrows seems to create an edge case.
parse(text = "function(){} -> f")[[1]]
#> function() f <- {
#> }
Good to know.
It seems that connecting function() to its definition has lower precedence than -> in R's parser:
body(function() {} -> f)
#> f <- {
#> }
body(f <- function() {})
#> {
#> }
Created on 2018-12-14 by the reprex package (v0.2.1.9000)
Anyway, parsing-deparsing-parsing doesn't sound too appealing to me. Let's shelve for now.
Any chance we can get this parser bug fixed in base R? I don't have an account to file a bug report. Alternatively, just check if we have function() {} -> x and not revert anything.
I don't think we need to take any action here: you can't define a function with right assignment.
function() {} -> f
#> function() {} -> f
f
#> Error in eval(expr, envir, enclos): object 'f' not found
ff <- {function() {} -> f}
body(ff)
#> f <- {
#> }
Created on 2019-10-31 by the reprex package (v0.3.0)
For the record I don't think this is a bug in the parser as it is almost certainly intentional. {x <- 5} and {5 -> x} are equivalent so it doesn't really make sense for the evaluator to have different paths for -> and <-.
As for the defining a function thing, I mean... you can if you really want to
> {function() {}} -> f
> f
function() {}
> rm(f)
> (function() {}) -> f
> f
function() {}
I don't really know why you would though.
I'm a bit confused though about the larger context there, though. I guess you guys are transforming raw text, rather than parsed code, and without taking into account what the parser will actually think the code says? If so yeah I guess you'd have to do it yourself
I see a problem when simply reversing lhs <- rhs to rhs -> lhs isn't equivalent for the parser. The function example is an important one, are there others?
styler:::create_tree("f <- function() 5")
#> levelName
#> 1 ROOT (token: short_text [lag_newlines/spaces] {pos_id})
#> 2 °--expr: [0/0] {1}
#> 3 ¦--expr: [0/1] {3}
#> 4 ¦ °--SYMBOL: f [0/0] {2}
#> 5 ¦--LEFT_ASSIGN: <- [0/1] {4}
#> 6 °--expr: [0/0] {5}
#> 7 ¦--FUNCTION: funct [0/0] {6}
#> 8 ¦--'(': ( [0/0] {7}
#> 9 ¦--')': ) [0/1] {8}
#> 10 °--expr: [0/0] {10}
#> 11 °--NUM_CONST: 5 [0/0] {9}
styler:::create_tree("function() 5 -> f")
#> levelName
#> 1 ROOT (token: short_text [lag_newlines/spaces] {pos_id})
#> 2 °--expr: [0/0] {1}
#> 3 ¦--FUNCTION: funct [0/0] {2}
#> 4 ¦--'(': ( [0/0] {3}
#> 5 ¦--')': ) [0/1] {4}
#> 6 °--expr: [0/0] {5}
#> 7 ¦--expr: [0/1] {7}
#> 8 ¦ °--NUM_CONST: 5 [0/0] {6}
#> 9 ¦--RIGHT_ASSIGN: -> [0/1] {8}
#> 10 °--expr: [0/0] {10}
#> 11 °--SYMBOL: f [0/0] {9}
Created on 2019-10-31 by the reprex package (v0.3.0)
Those are equivalent but lhs and rhs aren't what you seem to want them to be.
In fact, at the top level expression, there is no RHS in either case because there is no assignment occuring. the top level expressions are equivalent to
function() { x <-5 }
and
function(){ 5 -> x}
Respectively (I know they're not technically identical, but disregard the brackets since in practice they don't do anything but clarify what the parser will grab in this case). Neither of these expressions contain top level assignment, which means neither of them have a RHS at all.
Now, the first expression within the body of each of those functions IS an assignment, which does have an LHS and RHS, and which flipping the assignment direction and LHS and RHS are fully equivalent.
A bit more specifically to the case you're brought up I don't think there is any way in function() 5 -> x for R to know that it should stop parsing the function body at 5 without <- and -> having pretty radically different precedences, which I think we definitly would not want at all.
We certainly wouldn't want
function() x <- 5
To parse equivalent as
(function() x) <- 5)
I don't really know why you would though.
I'm a bit confused though about the larger context there, though. I guess you guys are transforming raw text, rather than parsed code, and without taking into account what the parser will actually think the code says?
Not sure I can follow. If we just can as.character(parse(text = "5 -> x")), to obtain "x <- 5" (and then style this code), we'd probably prefer that opposed to re-arranging tokens ourself within the nested parse table. It means less complexity on our side plus probably also better performance in terms of speed. What makes you think this is a bad idea.
Only objection from my side for this approach: It's not a transformer function that operates on the parse table, violating one of styler's design principles that if not absolutely necessary, all style rules should be implemented as transformers.
On Thu, Oct 31, 2019 at 4:01 PM Lorenz Walthert [email protected] wrote:
I don't really know why you would though.
I'm a bit confused though about the larger context there, though. I guess you guys are transforming raw text, rather than parsed code, and without taking into account what the parser will actually think the code says?
Not sure I can follow. If we just can as.character(parse(text = "5 -> x")), to obtain "x <- 5" (and then style this code), we'd probably prefer that opposed to re-arranging tokens ourself within the parse table. It means less complexity on our side plus probably also better performance in terms of speed. What makes you think this is a bad idea.
Oh, no I think that is a good idea (honesty I would have assumed you were
doing something like that already given no familiarity with the package's
inner workings). I was saying I don't know why a code author would want so
badly to define a functtion and assign it to it's symbol via -> instead
of <-.
The one issue I can think of with round-trip parsing is what it would do to comments. I don't know if you guys retain and handle comments or and I haven't looked at the srcref stuff recently enough to nkow how far you could get for that after parsing, but it is something to keep in mind, I think
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/r-lib/styler/issues/409?email_source=notifications&email_token=AAG53MJZ4VHPCVKH6PYR3S3QRNPTDA5CNFSM4FJEMHP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECZQCVY#issuecomment-548602199, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG53MPHBABIPSKZTUGEQO3QRNPTDANCNFSM4FJEMHPQ .
@gmbecker: I'm comparing f <- function() 5 with function() 5 -> f . These two are parsed differently.
If we only reverse -> towards <- this isn't relevant, but we can't simply reverse <- to -> in all cases.
On Thu, Oct 31, 2019 at 4:08 PM Kirill Müller [email protected] wrote:
@gmbecker https://github.com/gmbecker: I'm comparing f <- function() 5 with function() 5 -> f . These two are parsed differently.
If we only reverse -> towards <- this isn't relevant, but we can't simply reverse <- to -> in all cases.
That's what I'm saying though, it doesn't matter that those look equivalent
based on the characters, they parse (and should parse) radically
differently. The correct simple reversal of function() 5 ->f is
function() f <- 5, NOT f <- function() 5
Put another way, -> is not at the top of the AST in the second example,
so reversing it and then placing it at the top of the AST will, of course,
give you a very different non-equivalent expression. This is why, as I
pointed out in my most recent response to Lorenz, I DO actually think you
guys should be round-trip parsing before you do any transformations.
Though actually now that I thik about it you're gonna have a bad time with
rlang-based expressions if you do that because they violate the rules of
the parser. So you'll have to special-case something I guess.
—
You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/r-lib/styler/issues/409?email_source=notifications&email_token=AAG53ML7KWVFZBWAYQGYKWLQRNQOTA5CNFSM4FJEMHP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECZQPWQ#issuecomment-548603866, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG53MPDIJBXLP7COCR626DQRNQOTANCNFSM4FJEMHPQ .
Good point about the comments, I think we can't use the approach with parsing:
as.character(parse(text = "x <- # comment
3"))
#> [1] "x <- 3"
Created on 2019-11-01 by the reprex package (v0.3.0)