julia icon indicating copy to clipboard operation
julia copied to clipboard

headless anonymous function (->) syntax

Open rapus95 opened this issue 3 years ago β€’ 163 comments

Edit 3: Still sold on the original simple idea with optional extensions, as in the previous edits, or combined with https://github.com/JuliaLang/julia/pull/53946. Previous edits highlight and explore different ideas to stretch into, all adding their own value to different parts of the ecosystem. For a glimpse on the simple approach, have a look at the description after all the edited cross-references.


Edit 2: again newly progressed state of this proposal: https://github.com/JuliaLang/julia/issues/38713#issuecomment-1436118670


Edit 1: current state of this proposal: https://github.com/JuliaLang/julia/issues/38713#issuecomment-1188977419


Since https://github.com/JuliaLang/julia/pull/24990 stalls on the question of what the right amount of tight capturing is

Idea

I want to propose a headless -> variant which has the same scoping mechanics as (args...)-> but automatically collects all not-yet-captured underscores into an argument list. EDIT: Nesting will follow the same rules as variable shadowing, that is, the underscore binds to the tightest headless -> it can find.

Before After
lfold((x,y)->x+2y, A) lfold(->_+2_,A)
lfold((x,y)->sin(x)-cos(y), A) lfold(->sin(_)-cos(_), A)
map(x->5x+2, A) map(->5_+2,A)
map(x->f(x.a), A) map(->f(_.a),A)

Advantage(s)

In small anonymous functions underscores as variables can increase the readability since they stand out a lot more than ordinary letters. For multiple argument cases like anonymous functions for reduce/lfold it can even save a decent amount of characters. Overall it reads very intuitively as start here and whatever arguments you get, just drop them into the slots from left to right

      -> ---| -----|
            V      V
lfold(->sin(_)-cos(_), A)

Sure, some more complex options like reordering ((x,y)->(y,x)), ellipsing ((x...)->x) and probably some other cases won't be possible but if everything would be possible in the headless variant we wouldn't have introduced the head in the first place.

Feasibility

  1. Both, a leading -> and an _ as the right hand side (value side) error on 1.5 so that shouldn't be breaking.
  2. Since it uses the well-defined scoping of the ordinary anonymous functions it should be easy to 2a) switch between both variants mentally 2b) reuse most of the current parser code and just extend it to collect/replace underscores

Compatibility with #24990

It shouldn't clash with the result of #24990 because that focuses more on ~tight single argument~ very tight argument cases. And even if you are in a situation where the headless -> consumes an underscore from #24990 unintentionally, it's enough to just put 2 more characters (->) in the right place to make that underscore once again standalone.

rapus95 avatar Dec 05 '20 03:12 rapus95

After that long, inconclusive debate, I think I've also come to the conclusion that having an explicit marker is better. Headless -> is the "natural" marker for this, so there you have it β€”Β it's simple and unambiguous. It even leaves room for letting Array{_, 3} be a shorthand for Array{<:Any, 3}. We would have to decide what to do in cases like -> (_, Array{_, 3}), but I would argue that it would probably be best to just make that an error and not allow _ in type parameter position inside of a headless lambda, which would need to be decided when implementing this, since otherwise making it an error after this feature is implemented would be a breaking change.

StefanKarpinski avatar Dec 05 '20 19:12 StefanKarpinski

To clarify, the question is which of these -> (_, Array{_, 3}) would mean:

  • x -> (x, Array{<:Any, 3})
  • (x, y) -> (x, Array{y, 3})

Both could potentially make sense. My suggestion is to make it an error and force the user to either use a normal lambda or not use _ as a type parameter. Alternatively, we could say that _ always "binds" to the tightest thing it could. That's also a consideration in the presence of nested headless lambdas. For example, it seems only sensible to interpret -> (_, -> _ + _) as meaning x -> (x, (y, z) -> y + z), so you could make the same case for -> (_, Array{_, 3}) that the _ as an argument to Array should mean Array{<:Any} since it's innermost.

StefanKarpinski avatar Dec 05 '20 19:12 StefanKarpinski

Sure, some more complex options like reordering ((x,y)->(y,x))

One solution is to say that inside of a headless anonymous function _n refers to the nth argument. So (x, y) -> (y, x) would be written as

-> (_2, _1)

There is precedent for this in Clojure:

The function literal supports multiple arguments via %, %n, and %&.

#(println %1 %2 %3)

and Mathematica:

#n represents the n ^(th) argument.

In[1] := f[#2, #1] &[x, y]
Out[1] = f[y, x]

yurivish avatar Dec 05 '20 19:12 yurivish

Alternatively, we could say that _ always "binds" to the tightest thing it could.

that's exactly what I meant when saying

automatically collects all not-yet-captured underscores into an argument list.

so yes, I'd clearly be in favor of making them bind the tightest. Regarding the case of parametric underscore in headless lambda, I'd propose to already special case that in the parser (or wherever that belongs to πŸ™ˆ) but make it error for now. That way we're free to add a proper rule once we've found a good solution without being breaking. Though, right now I'm thinking about the following idea:

sugar expanded
->(_, Array{_, 1}) x->(x, Array{<:Any, 1})
->(_, Array{<:_, 1}) (x,y)->(x, Array{<:y, 1})

for types the 2nd approach should work in the most cases since <:LeafType == LeafType for leaf types while one rarely needs actual abstract types as fixed parameters (and even then we have AbstractType <: AbstractType) The drawback is that this approach wouldn't work for bitstypes like integers since <:1 is undefined iirc. BUT! once we have something like Array{String,::Int} to denote the 2nd parameter to be an integer, we might be able to allow ->Array{String, _::T} as (x::T)->Array{String, x}

rapus95 avatar Dec 05 '20 21:12 rapus95

regarding _1,_2 to denote the argument order still makes me have mixed feelings. On the one hand it reduces the readability since it's no more simply "put in from left to right" and on the other hand it doesn't save much anymore since now you need 2 characters to denote a single variable. a normal lambda would need 3 characters (definition, comma in argument list and usage)

rapus95 avatar Dec 05 '20 21:12 rapus95

The other advantage of numbering is that it lets you re-use the same argument:

sort(xs, by= -> _.re + _.im/100)  # x -> x.re + x.im/100
sort(xs, lt= -> _1.re < _2.re)    # (x,y) -> x.re < y.re

Edit -- You could argue that the first line only saves one character, and the second 3 not 5. But what it does still save is the need to invent a name for the variable.

Re-using the same symbol with different meanings in an expression seems confusing to me. Is it ever actually ambiguous? (The order of symbols in +(a,b) and a+b differ, with the same Expr, but _ will never be infix. So perhaps that cannot happen?)

Edit' -- Not the most convincing example, but notice that 1,2,3 occur in order in the lowered version of this, as Iterators.filter puts the condition before the argument, but the comprehension puts it afterwards:

dump(:(  [f(x,1) for x in g(z,3) if h(x,2)]  ), maxdepth=10)

mcabbott avatar Dec 05 '20 21:12 mcabbott

regarding _1,_2 to denote the argument order still makes me have mixed feelings. On the one hand it reduces the readability since it's no more simply "put in from left to right" and on the other hand it doesn't save much anymore since now you need 2 characters to denote a single variable. a normal lambda would need 3 characters (definition, comma in argument list and usage)

You can continue to use _ for that; the numbered underscores are just a syntax for referring to arguments by their position.

In Mathematica's and Clojure _ always refers to the first argument, and numbered underscores are usually used for the second/third/... arguments.

So @mcabbott's first example works as-is and the second example can also be written as

sort(xs, lt= -> _.re < _2.re)    # (x,y) -> x.re < y.re

yurivish avatar Dec 05 '20 21:12 yurivish

to be honest I'm absolutely against making the single underscore refer to the same argument. Because we'd lose the entire convenience for the multi argument cases only to save 1(!!) character in rare situations... ->_.re + _.im/100 vs x->x.re + x.im/100 that's just not worth it. So IMO each underscore should denote its own argument, from left to right. By that approach compared to @mcabbott 's suggestion we'd save 2 characters for the different argument case and only lose a single character for the same argument case. (which btw would currently be handled by the tight binding approach of #24990) I. e. whenever it feels like multiple underscores should denote the same element, just use an ordinary lambda (it'll only cost you a single character extra)

rapus95 avatar Dec 05 '20 22:12 rapus95

the numbering approach won't be possible until 2.0 anyway since _2 etc are currently valid identifiers and thus that would be breaking AFAIU

rapus95 avatar Dec 05 '20 22:12 rapus95

the numbering approach won't be possible until 2.0 anyway since _2 etc are currently valid identifiers and thus that would be breaking AFAIU

I think it wouldn't be breaking if _2 only means the second argument inside a headless anonymous function.

Good points re: just using an ordinary lambda. I'm curious what fraction of anonymous functions would be made shorter by the "headless" type. It'd be hard to measure accurately, since anonymous functions are used a lot in interactive non-package code.

yurivish avatar Dec 05 '20 22:12 yurivish

Re-using the same symbol with different meanings in an expression seems confusing to me

while certainly true for most characters, the underscore already has the meaning of "fill in whatever you get, I won't refer to it anywhere else" See underscore as left hand side where it's used to tell that we don't need the result; and as a parametric argument to denote "I don't care about the actual type" So I'd try to use that freedom to not to be bound to ordinary variable behavior. If I wanted ordinary behaviour for it, I could just use an ordinary variable πŸ‘€ if we were crazy we could just bind any variable which would otherwise result in an undefvarerror πŸ˜‚ but I'm strongly against that.

rapus95 avatar Dec 05 '20 22:12 rapus95

I'm in agreement with @rapus95 here: let's stick to the simple win with _ for positional arguments. Anything more complicated seems to me better expresses with named arguments.

StefanKarpinski avatar Dec 05 '20 23:12 StefanKarpinski

Oh, one more thing to consider: interaction with |>. With just the basic proposal here, one would often need to write something like this:

collection |> -> map(lowercase, _) |> -> filter(in(words), _)

We could either introduce a new pipe syntax as a shorthand for this, or say that |> also acts as a headless lambda delimiter in the absence of -> so if there are unbound _ in the expression following the |> then the -> is implicitly inserted before the |> for you. So you'd write the above like this instead:

collection |> map(lowercase, _) |> filter(in(words), _)

This does allow using another headless lambda for the filter/map operation, like this for example:

collection |> map(-> _ ^ 7, _) |> filter(-> _ % 3 == 0, _)

Might want to do something similar with <| for symmetry. It feels a little weird to single out these two, but I can't see anything more general that makes much sense to me.

StefanKarpinski avatar Dec 06 '20 15:12 StefanKarpinski

I love that idea tbh because it would give us some part of #24990 for "free". The only thing that holds me back is that, in that case applying the same rule to \circ probably make sense aswell and then it starts to feel like arbitrary special casing again...

Btw, since the headless approach is currently primarily developed around multiple argument cases, it'd be very nice if we had a syntactical solution for splatting. Otherwise cases like (1,2) |> (x,y)->x+y wont work anyway since we'd have to wrap the anonymous function into Base.splat

EDIT: could we use |>> as head that automatically splats Tuple? It already has that extra > which somewhat hints the -> and if it's made specifically for that, we could even include splatting since it would be a special operator the evolves around chaining anonymous functions.

rapus95 avatar Dec 06 '20 17:12 rapus95

Could syntax like map(... -> sin(_) - cos(_), xy_tuples) work?

yurivish avatar Dec 06 '20 18:12 yurivish

I think argument 1 could be 1 underscore (_), argument 2 could be 2 underscores (__), etc. This is how things are currently done in the queryverse. Ref https://www.queryverse.org/Query.jl/stable/experimental/#The-_-and-__-syntax-1 and https://github.com/JuliaLang/julia/pull/24990#issuecomment-431960201 and https://github.com/JuliaLang/julia/pull/24990#issuecomment-449446671

bramtayl avatar Dec 06 '20 18:12 bramtayl

@bramtayl having every single underscore reference the same single argument doesn't scale well for the headless syntax. Read https://github.com/JuliaLang/julia/issues/38713#issuecomment-739422316 for why. and needing the same argument in multiple places is also a comparably rare case.

rapus95 avatar Dec 06 '20 19:12 rapus95

Hmm, needing to use the same argument in multiple places happens all the time in querying though, I think. Consider processing a row of a table: you might need to reference several fields of the row.

bramtayl avatar Dec 06 '20 19:12 bramtayl

how many characters more would you need if you switch to using an ordinary lambda with a single letter variable name instead? (remember that you need the -> in any case)

rapus95 avatar Dec 06 '20 19:12 rapus95

The extra _ for the second argument seems to me to be a very small price to pay for the flexibility of using an argument as many times as you want, is all

bramtayl avatar Dec 06 '20 19:12 bramtayl

a) but it doesn't scale into more different arguments b) it's a huge price! we waste an entire syntax to save a single character compared to an ordinary lambda. You will never be able to save more than a single character if multiple underscores denote the same argument.

rapus95 avatar Dec 06 '20 19:12 rapus95

It seems to me likely that wanting to make a two argument anonymous function will be much less common than wanting to reference an argument more than once

bramtayl avatar Dec 06 '20 19:12 bramtayl

@bramtayl do the maths yourself. it scales very bad (i.e.negatively) if you intend on using any other than the first argument more than once or if you intend to use more than 3 arguments so, multiple chained underscores just don't benefit us for the general purpose case.

_i for i being a single digit number would still be a better proposal for that case, both for scaling in number of uses per argument and number of arguments. But that can live in its own issue since it is orthogonal to the current proposal

rapus95 avatar Dec 06 '20 20:12 rapus95

_i would definitely work too. I suppose I could go through some code and count the number of times you have different numbers of arguments in anonymous functions and the number of times you reuse an argument. Even though I suspect that reusing an argument will be far more common, it doesn't matter too much, because I think it would be nice to have a syntax flexible enough to do both. Do you know of any use-cases for three-argument anonymous functions?

bramtayl avatar Dec 06 '20 20:12 bramtayl

just include a lot of code samples which use reduce and similar functions. If you only include code that maps data it would be an unfair comparison. But this already shows what I'm talking about. We don't want a domain specific syntax feature in the general purpose language. And the queryverse definitively is domain specific. And it already has a macro for that exact case. Which doesn't seem like it made it outside of that domain.

rapus95 avatar Dec 06 '20 20:12 rapus95

the underscore already has the meaning of "fill in whatever you get, I won't refer to it anywhere else" See underscore as left hand side where it's used to tell that we don't need the result; and as a parametric argument to denote "I don't care about the actual type"

I think more fundamentally, the principle behind _ is something like "I want to avoid giving an arbitrary name to a value". As of now, you can avoid giving an arbitrary name if

  • you don't need to access the value that would be named arbitrarily
(_, r) = divrem(1,2)
foo(_) = 3
foo(nothing)
  • (less famously) there is only one field in the struct, which is perfect for "wrapper" types
struct Foo
    _::Int64
end

Foo(3)._

Note this usage is a counterexample to _ meaning "I won't refer to it". (xref: https://github.com/JuliaLang/julia/issues/37381)

It seems (at least) roughly consistent with this principle that Array{_, 3} mean Array{<:Any, 3}, which is Array{var"#s27", 3} where var"#s27", where the need for an arbitrary name is fulfilled by something like gensym.

The use of _ as an arbitrary name for a type parameter, combined with the proposal here, leads to a possible ambiguity already mentioned. To be sure, consider the interaction with _ as a field name:

x->x._ could be written as ->_._. I do not think there is any chance for ambiguity here, the same way that there is no chance for ambiguity with x->x.x.

I agree that it would be more useful if each _ in the anonymous body referred to a subsequent argument, as opposed to the alternative that _ always refers to the first argument. I'm not sure if this choice can be justified as a natural consequence of _ meaning "avoid arbitrary name", and that the alternative cannot be, but it kind of feels that way. Certainly if _ is used as a type parameter, each _ would be a unique parameter.

goretkin avatar Dec 07 '20 03:12 goretkin

Hmm, well, a quick audit of non-single-argument-0-or-1-mention uses of -> in julia/base, excluding splats, is below. A couple of notes:

  • 3 argument anonymous functions seem extremely rare; I didn't see any 4 argument ones.
  • At least from this sample, reusing an argument in an anonymous function is about as common as having multiple arguments.
  • I did this relatively quickly, so I probably missed some

Multiple arguments

sum(map((i, s, o)->s*(i-o), J, strides(x), Tuple(first(CartesianIndices(x)))))*elsize(x)
foldr((v, a) -> prepend!(a, v), iter, init=a)
(r,args) -> (r.x = f(args...))
(i,args) -> (itr.results[i]=itr.f(args...))
((p, q) -> p | ~q))
((p, q) -> ~p | q))
((p, q) -> ~xor(p, q)))
((p, q) -> ~p & q))
((p, q) -> p & ~q)))
map((rng, offset)->rng .+ offset, I.indices, Tuple(j))
dict_with_eltype((K, V) -> Dict{K, V}, kv, eltype(kv))
foldl((x1,x2)->:($x1 || ($expr == $x2)), values[2:end]; init=:($expr == $(values[1])))
retry(http_get, check=(s,e)->e.status == "503")(url)
retry(read, check=(s,e)->isa(e, IOError))(io, 128; all=false)
dict_with_eltype((K, V) -> IdDict{K, V}, kv, eltype(kv))
CartesianIndices(map((i,j) -> i:j, Tuple(I), Tuple(J)))
CartesianIndices(map((i,s,j) -> i:s:j, Tuple(I), Tuple(S), Tuple(J)))
map((isrc, idest)->first(isrc)-first(idest), indssrc, indsdest)
(x,y)->isless(x[2],y[2])
(x, y) -> lt(by(x), by(y))
(io, linestart, idx) -> (print(io, idx > 0 ? lpad(cst[idx], nd+1)
(mod, t) -> (print(rpad(string(mod) * "  ", $maxlen + 3, "─"));
(f, x) -> f(x)
(f, x) -> wait(Threads.@spawn f(x))
afoldl((ys, x) -> f(x) ? (ys..., x) : ys, (), xs...)
Base.dict_with_eltype((K, V) -> WeakKeyDict{K, V}, kv, eltype(kv))
simple_walk(compact, lifted_val, (pi, idx)->true)
(io::IO, indent::String, idx::Int) -> printer(io, indent, idx > 0 ? code.codelocs[idx] : typemin(Int32))

Reuses an argument

dst::typeof(I) = ntuple(i-> _findin(I[i], i < n ? (1:sz[i]) : (1:s)), n)::typeof(I)
src::typeof(I) = ntuple(i-> I[i][_findin(I[i], i < n ? (1:sz[i]) : (1:s))], n)::typeof(I)
CartesianIndices(ntuple(k -> firstindex(A,k):firstindex(A,k)-1+@inbounds(halfsz[k]), Val{N}()))
CartesianIndices(ntuple(k -> k == dims[1] ? (mid:mid) : (firstindex(A,k):lastindex(A,k)), Val{N}()))
all(d->idxs[d]==first(tailinds[d]),1:i-1)
map(x->string("args_tuple: ", x, ", element_val: ", x[1], ", task: ", tskoid()), input)
foreach(x -> (batch_refs[x[1]].x = x[2]), enumerate(results))
map(v -> Symbol(v[1]) => v[2], split.(tag_fields, "+"))
findlast(frame -> !frame.from_c && frame.func === :eval, bt)
ntuple(n -> convert(fieldtype(T, n), x[n]), Val(N))
map(chi -> (chi.filename, chi.mtime), includes)
filter(x -> !(x === empty_sym || '#' in string(x)), slotnames[(kwli.nargs + 1):end])
ntuple(i -> i == dims ? UnitRange(1, last(r[i]) - 1) : UnitRange(r[i]), N)
ntuple(i -> i == dims ? UnitRange(2, last(r[i])) : UnitRange(r[i]), N)
map(n->getfield(sym_in(n, bn) ? b : a, n), names)
filter!(x->!isempty(x) && x!=".", parts)
all(map(d->iperm[perm[d]]==d, 1:N))
ntuple(i -> i == k ? 1 : size(A, i), nd)
ntuple(i -> i == k ? Colon() : idx[i], nd)
map(x->x isa Integer ? UInt64(x) : String(x), pre)
map(x->x isa Integer ? UInt64(x) : String(x), bld))
_any(t -> !isa(t, DataType) || !(t <: Tuple) || !isknownlength(t), utis)
 _all(i->at.val[i] isa fieldtype(t, i), 1:n)
filter(ssa->!isa(ssa, SSAValue) || !(ssa.id in intermediaries), useexpr.args[(6+nccallargs):end])
findfirst(i->last_stack[i] != stack[i], 1:x)
 x -> (x = new_nodes_info[x]; (x.pos, x.attach_after))
filter(p->p != 0 && !(p in bb_defs), cfg.blocks[bb].preds)
filter(ex -> !(isa(ex, LineNumberNode) || isexpr(ex, :line)), ex.args)

bramtayl avatar Dec 07 '20 03:12 bramtayl

there is only one field in the struct, which is perfect for "wrapper" types

@goretkin that case is perfectly handled by interpreting it as 0-d and using x[] (as Ref does) the "whatever comes I won't refer to it" otoh was the only reason why underscore became reserved. That's the reason why it must not be used as right hand side.

@bramtayl would you be willing to translate these cases into both (or even better all 3 variants) i. e. _ _ _ __ _1 _2 and measure the number of characters saved, compared to the ordinary lambda?

rapus95 avatar Dec 07 '20 11:12 rapus95

Thanks for gathering these, @bramtayl.

In the "multiple arguments" list, it looks like 12/28 don't follow the simple pattern of using every argument, exactly once, in order, and not as a type parameter.

2 of those simply drop trailing arguments, (x, _...) -> stuff, which raises the question (not so-far discussed?) of whether these headless lambdas should in general accept more arguments than they use, or not. Should map(->nothing, xs) work?

Details Drop last:
simple_walk(compact, lifted_val, (pi, idx)->true)
(mod, t) -> (print(rpad(string(mod) * "  ", $maxlen + 3, "─"));
Drop first or middle:
retry(http_get, check=(s,e)->e.status == "503")(url)
retry(read, check=(s,e)->isa(e, IOError))(io, 128; all=false)
(io, linestart, idx) -> (print(io, idx > 0 ? lpad(cst[idx], nd+1)
Shuffle:
sum(map((i, s, o)->s*(i-o), J, strides(x), Tuple(first(CartesianIndices(x)))))*elsize(x)
Re-use:
afoldl((ys, x) -> f(x) ? (ys..., x) : ys, (), xs...)
foldr((v, a) -> prepend!(a, v), iter, init=a)
(io::IO, indent::String, idx::Int) -> printer(io, indent, idx > 0 ? code.codelocs[idx] : typemin(Int32))
Type parameters:
dict_with_eltype((K, V) -> Dict{K, V}, kv, eltype(kv))
dict_with_eltype((K, V) -> IdDict{K, V}, kv, eltype(kv))
Base.dict_with_eltype((K, V) -> WeakKeyDict{K, V}, kv, eltype(kv))

Simple cases (every parameter used exactly once, in order) [Edit -- now with some brackets removed]:

(r,args) -> (r.x = f(args...))
(i,args) -> (itr.results[i]=itr.f(args...))
((p, q) -> p | ~q )
((p, q) -> ~p | q )
((p, q) -> ~xor(p, q))
((p, q) -> ~p & q)
((p, q) -> p & ~q)
map((rng, offset)->rng .+ offset, I.indices, Tuple(j))
foldl((x1,x2)->:($x1 || ($expr == $x2)), values[2:end]; init=:($expr == $(values[1])))
CartesianIndices(map((i,j) -> i:j, Tuple(I), Tuple(J)))
CartesianIndices(map((i,s,j) -> i:s:j, Tuple(I), Tuple(S), Tuple(J)))
map((isrc, idest)->first(isrc)-first(idest), indssrc, indsdest)
(x,y)->isless(x[2],y[2])
(x, y) -> lt(by(x), by(y))
(f, x) -> f(x)
(f, x) -> wait(Threads.@spawn f(x))

... which could become (with each _ a new argument)

-> (_.x = f(_...))
-> (itr.results[_]=itr.f(_...))
(-> _ | ~_ )
(-> ~_ | _ )
(-> ~xor(_, _))
(-> ~_ & _ )
(-> _ & ~_ )
map(->_ .+ _, I.indices, Tuple(j))
foldl(->:($_ || ($expr == $_)), values[2:end]; init=:($expr == $(values[1])))
CartesianIndices(map(-> _:_, Tuple(I), Tuple(J)))
CartesianIndices(map(-> _:_:_, Tuple(I), Tuple(S), Tuple(J)))
map(->first(_)-first(_), indssrc, indsdest)
->isless(_[2],_[2])
-> lt(by(_), by(_))
-> _(_)
-> wait(Threads.@spawn _(_))

Interesting how few from either list above would be clearer (IMO) without naming variables -- most are quite long & complicated. So another possibility to consider is that this headless -> syntax could be restricted to zero or one arguments [Edit -- I think I meant to say, "used at most once"], at least initially. To emphasise that it's for writing short things, where clarity may be improved by not having to name the variable.

I'm not sure that counting characters saved is a great measure, as the cases where you could save the most letters also seem like the ones complicated enough that you ought to be explicit. [Nor is counting how many cases in Base, really.] But using |> as a fence seems neat (it's visually -> with the minus rotated, right?) and means that some one-argument cases could become quite a bit shorter & less cluttered. For example you don't have to think about whether it's confusing to re-use the same name for vs & xs here:

[rand(Int8,5) for _ in 1:7] |> vs -> reduce(vcat, vs) |> xs -> filter(x -> x%3 != 0, xs)

collection |> reduce(vcat, _) |> filter(-> _%3 != 0, _)

mcabbott avatar Dec 07 '20 13:12 mcabbott

tbh I just find some of the spacings hard to read like here: -> ~_ | _ but I guess that has to do with not being used to unary operators in front of an underscore πŸ˜„ that'll come with time. Btw, you have more closing than opening parantheses πŸ™ˆ Regarding the readability (especially with p and q) I find the underscore variant a lot more readable since I know that I can rely on a) where does the closure manifest and b) in which order the arguments are filled in both in a way where I don't have to scan forward and backward but simply from left to right, once.

I intended the focus to be on >1 arg cases since going from 1 to 2 args costs a lot more characters and thus, focusing on single argument cases just doesn't bring profit at all (except that single saved character). But on 2-arg 1 use each cases the proposal indeed shines IMO. whether more arguments should use explicit naming is a matter of taste and situation I guess.

rapus95 avatar Dec 07 '20 14:12 rapus95

there is only one field in the struct, which is perfect for "wrapper" types

@goretkin that case is perfectly handled by interpreting it as 0-d and using x[] (as Ref does) the "whatever comes I won't refer to it" otoh was the only reason why underscore became reserved. That's the reason why it must not be used as right hand side.

@rapus95, I disagree that that is perfect, especially when you would like getindex to have another meaning than "access my one field", which is the case for all of these:

julia> allsubtypes(T) = (s = subtypes(T); union(s, (allsubtypes(K) for K in s)...)) # https://discourse.julialang.org/t/reduce-on-recursive-function/24511/5?u=goretkin
allsubtypes (generic function with 1 method)

julia> allsubtypes(AbstractArray) |> x->filter(!isabstracttype, x) |> x->filter(T->1==fieldcount(T), x) |> x -> map(T->(T, only(fieldnames(T))), x)
25-element Vector{Tuple{UnionAll, Symbol}}:
 (Adjoint, :parent)
 (Base.SCartesianIndices2, :indices2)
 (CartesianIndices, :indices)
 (Core.Compiler.LinearIndices, :indices)
 (Diagonal, :diag)
 (LinearIndices, :indices)
 (PermutedDimsArray, :parent)
 (SuiteSparse.CHOLMOD.FactorComponent, :F)
 (Test.GenericArray, :a)
 (Transpose, :parent)
 (UpperHessenberg, :data)
 (Base.IdentityUnitRange, :indices)
 (Base.OneTo, :stop)
 (Base.Slice, :indices)
 (Core.Compiler.IdentityUnitRange, :indices)
 (Core.Compiler.OneTo, :stop)
 (Core.Compiler.Slice, :indices)
 (Base.CodeUnits, :s)
 (Base.Experimental.Const, :a)
 (SuiteSparse.CHOLMOD.Dense, :ptr)
 (LowerTriangular, :data)
 (UnitLowerTriangular, :data)
 (UnitUpperTriangular, :data)
 (UpperTriangular, :data)
 (SuiteSparse.CHOLMOD.Sparse, :ptr)

I am not saying that all of those field names are arbitrary. stop evokes a useful meaning. data (and to a lesser extent parent) does not. The name ptr does not convey any additional information to the type Ptr{...}.

I am not sure if you were just objecting to my characterization of _, or if you think using _ as a field name is bad. To emphasize a previous point, using it as a field name does not interfere with the proposal here.

goretkin avatar Dec 07 '20 17:12 goretkin

Just eyeballing it, the lambdas where an argument is referred to multiple times tend to be relatively long and complex and often have fairly long argument names, whereas the lambdas where there are multiple arguments that are each used once in order tend to be relatively short and simple. Since the aim of this feature is to make it easier to write short, simple lambdas, this would seem to favor the originally proposed behavior.

StefanKarpinski avatar Dec 07 '20 17:12 StefanKarpinski

Ok, I can see that. How about another argument: having _ mean two different things right next to each other violates the principle of least surprise?

bramtayl avatar Dec 07 '20 18:12 bramtayl

Another way of looking at it is this: why is it so annoying to give names for x and y in a lambda like (x, y) -> x[y]? It's not really that much typing. The issue is really that it shifts the focus both when writing and reading the code to these meaningless x and y variables and they are not the point β€” the key thing is the indexing action. It's annoying when writing it because you have to come up with names that don't matter and it's annoying when reading it because there's so much noise that gets in the way of the action which is what you're trying to express. Using "wildcard names" and writing _[_] instead puts the focus where it belongs β€”Β on the indexing as an action. So being able to write _[_] is not only shorter, but also clearer: it's immediately clear that the thing that's being expressed is the action of indexing.

How does that interact with wether _[_] should mean (x, y) -> x[y] or x -> x[x]? Well, the former is the generic act of indexing, naturally a two-argument action. The latter is the action of self-indexing, which, is a pretty unusual and niche action. With the alternate proposal, how would one express "the action of indexing"? You'd have to write _1[_2]... which, is imo firmly back into the territory of letting the arguments dominate; I'd even argue that writing (x, y) -> x[y] is clearer.

Let's take another example: (_ + _) / _. What should it mean? In the original proposal, it means (x, y, z) -> (x + y) / z, and captures the action of adding two things and dividing them by a third thing. In the alternate proposal, it means x -> (x + x) / x and captures the (bizarre and useless) action of adding a thing to itself and then dividing by that same thing. This example may seem silly, but I think the key insight is that the real power and benefit of headless lambdas is that it allows you to write the syntax of an action with "holes" where the arguments go and capture the "essence" of that action in a way that syntactically focuses on the action instead of the names.

StefanKarpinski avatar Dec 07 '20 18:12 StefanKarpinski

Hmm, well, still not convinced. One final argument: there was an argument above that querying tables is a niche application, but I'd argue it's far and away the most common use of numerical programming. If you combine all the users of SQL, LINQ, dplyr, Query.jl, and DataFramesMeta.jl, you'd get half of stackoverflow IMHO. And not being able to refer to a row more than once is a deal-breaker for querying. A prototypical example would be c = _.a + _.b.

In either case, I think conservatively making a headless -> with zero or one _ after work and integrate with |> would be a great improvement.

bramtayl avatar Dec 07 '20 20:12 bramtayl

The thing is that people who want to write row queries don't even want to write the _. β€”Β they just want the row to object to be fully implicit and just write a + b. The syntax :a + :b is already a de facto standard among Julia packages for row operations and allows not mentioning the row object. It seems like a step backwards for them to write _.a + _.b instead.

StefanKarpinski avatar Dec 07 '20 20:12 StefanKarpinski

I just can't help but feel that requiring a glyph at the beginning of the expression defeats the purpose.

-> foo(_, 1)

is literally the same number of characters in the same positions as

@_ foo(_, 1)

but requires new parser support, rather than a simple macro definition. What's the point?

The reason people complain about @_ is that they don't want to backtrack after they've already written foo. I really think that a glyph at the end of the expression would be a better solution to this problem. e.g.

foo(_, 1) _@

I've proposed foo bar@ as postfix macrocall syntax equivalent to @bar foo before, but that offended many people's sensibilities. Perhaps we can instead agree on a single postfix glyph specifically for this problem, rather than allowing all macros to be used in postfix? I still maintain though that postfix macros would be a more general solution.

MasonProtter avatar Dec 07 '20 21:12 MasonProtter

I think the :a syntax for getting a field from row is confusing because it is the same syntax for Symbol("a"). I think _.a + _.b is clearer and just as convenient (and it's what Query uses). I don't think dot overloading and named tuples even existed when DataFramesMeta was written. To a certain extent, a proposal like this could even make the macro processing side of Query obsolete, or to put it another way, it would allow the entire ecosystem to benefit from the syntax.

bramtayl avatar Dec 08 '20 00:12 bramtayl

I suppose I should also mention that the proposal here is very similar to the @_ and @> macros in LightQuery, which is part of why I'm so excited about it. I had to use some hacks to get the macros to resolve inside out (like functions) rather than outside in (like macros). It would be a lot nicer if they were officially supported as function syntax.

bramtayl avatar Dec 08 '20 00:12 bramtayl

I think the :a syntax for getting a field from row is confusing because it is the same syntax for Symbol("a"). I think _.a + _.b is clearer and just as convenient (and it's what Query uses).

:a being the same as Symbol("a") is the whole intent here. Just as a slight reminder: _.a == getproperty(_, :a) so yes, that's a Symbol. :name is a widespread way to index by column name. And since these are ordinary objects in Julia they take part in everything else, like grouping/broadcasting: getindex.(obj, [:a, :b, :c]) so having Symbols as selectors is a good thing anyway. And macros deliver the domain specific convenience.

To a certain extent, a proposal like this could even make the macro processing side of Query obsolete, or to put it another way, it would allow the entire ecosystem to benefit from the syntax.

Of course, if we embed domain specific features into the Base language, then that makes the domain specific features obsolete. Only drawback: We arrive at a domain specific language which excels in its domain but feels bulkloaded for everything else (i.e. a lot of features that aren't useful in generic cases). And tbh I don't want Julia to be a domain specific language.

rapus95 avatar Dec 09 '20 17:12 rapus95

I'm not convinced with the general usefulness of this construct vs making code less readable (and maintainable) -- this is an additional syntax burden on "accidental programmers" where using Julia is not their primary job. I think the wide-open interpretation of _[_] is a perfect example of why this is not a particularly great idea? How about an off-the-wall suggestion -- why not use bare subscripts? -> ₁[₁] at least this one is a bit unambiguous, (x₁,xβ‚‚,...) -> x₁[x₁], then -> β‚‚[₁] could mean (x₁,xβ‚‚,...) -> xβ‚‚[x₁]. I'd worry less on how easy it is to type than on how easy it is to read ~18 months later when you're fixing a bug. If you wish to keep _ for the 1 argument case, then we could use subscripts to access the nth item in the input... -> _β‚‚[_₁] could mean (x₁,xβ‚‚,...) -> xβ‚‚[x₁].

clarkevans avatar Dec 11 '20 16:12 clarkevans

Have you considered using broadcasting instead? That is, lift f.(_) to _ -> f(_). Add special sugar for getproperty and getkey: _.x = _ -> _.x, and then _.x .> _.y will be translated to _ -> ((_ -> _.x)(_) > (_ -> _.y)(_)), which can be used as a predicate function.

Perhaps it could even support multiple arguments, by defining _(xs...) = _(xs).

Pipelining with |> could be supported by introducing one-argument curried forms of map() and filter():

1:10 |> filter(isodd.(_)) |> map(_ .* 2) |> foldl(_[1] .+ _[2])

xitology avatar Dec 11 '20 16:12 xitology

making code less readable (and maintainable)

In my opinion neither of that is true since the use case for that new syntax is mostly compact multiple argument anonymous functions which are needed to partially apply arguments (and, since it's syntactical, in the future, it might even resolve into something more elaborate like a generalization of the Fix1/Fix2 types including actual partial pre-optimization). In ->g(_, a, h(_), _, f) as opposed to (x,y,z)->g(x,a,h(y),z,f) I, to be honest, find the first case way more readable. Regarding maintainability, I don't see where that would need any maintenance.

Regarding your proposal of using subscript numbers for indexing, I really like the idea for more complex cases, but it's an orthogonal proposal since they don't conflict and are indeed more cumbersome to type for very straight forward cases like my example above.

Have you considered using broadcasting instead?

That's not an option since we want ->f.(_) to mean x->f.(x)

1:10 |> filter(isodd.(_))

with your proposal that would error since it would resolve to 1:10 |> x->filter(isodd(x)) which in turn would be the same as filter(isodd(1:10))

rapus95 avatar Dec 11 '20 17:12 rapus95

@clarkevans I'm not sure bare subscripts could work until post 1.0 because they don't currently error I think, but I do like the idea of _₁ _β‚‚ etc especially if it would make it clearer what's going on. EDIT nevermind

bramtayl avatar Dec 11 '20 17:12 bramtayl

@bramtayl they do error:

julia> β‚‚=2
ERROR: syntax: invalid character "β‚‚" near column 1
Stacktrace:
 [1] top-level scope at none:1

I had to check that myself though, since I wanted to write the same as you πŸ˜‚

rapus95 avatar Dec 11 '20 17:12 rapus95

1:10 |> filter(isodd.(_))

with your proposal that would error since it would resolve to 1:10 |> x->filter(isodd(x)) which in turn would be the same as filter(isodd(1:10))

It works, here's the proof of concept (using it instead of _): https://gist.github.com/xitology/347e01ed012e9e2a5530c719b7163500

xitology avatar Dec 11 '20 18:12 xitology

@xitology If you haven't seen it, https://github.com/tkf/UnderscoreOh.jl uses broadcast like this, as a marker of where the function stops.

@rapus95 I think filter(isodd.(_), 1:10) resolves to filter(x -> isodd(x), 1:10), not to x -> filter(isodd(x), 1:10), in this proposal. The end of the broadcasting tells it where to stop.

That said, it does seem pretty confusing to re-use broadcasting for something unrelated, and will mean you can't use actual broadcasting at the same time.

mcabbott avatar Dec 11 '20 18:12 mcabbott

@xitology a) still, we don't want to lose the broadcasting ability to allow for a new syntax b) in your example you didn't have the same scoping mechanics as we are discussing here (i. e. 1:10 |> filter(isodd.(_)) would resolve to 1:10 |> ->filter(isodd.(_)) which in turn would resolve to 1:10 |> x->filter(isodd(x)) according to you (that in turn would error). for yours to work you would need an inner left barrier. i. e. something like that: 1:10 |> filter(->isodd.(_)) then that would work, but I don't yet get where the benefit would be to use broadcasting for it? especially since we'd lose actual broadcasting.

and please consider using some other example since your example is written the shortest and most concise by not using any special syntax at all 1:10 |> filter(isodd) πŸ™ˆ

rapus95 avatar Dec 11 '20 18:12 rapus95

@mcabbott Thanks, I wasn't aware of it, but it's not surprising that someone did it before. Extending functions to combinators is a natural application of broadcasting.

@rapus95 The example works, see line 35 in the proof of concept https://gist.github.com/xitology/347e01ed012e9e2a5530c719b7163500#file-it-jl-L35.

Broadcasting is designed to work with arbitrary containers and fits very naturally to this particular case. We just need consider a function as a "container" where the argument of the function plays the role of the key. Then broadcasting a function over a functional argument is reduced to a function composition. Taking identity as the starting point, we can convert any broadcasted expression of a function.

xitology avatar Dec 11 '20 18:12 xitology

@mcabbott oh thanks, now I understand the proposal. The idea is to use the broadcasting dot as the barrier instead of ->. Well then, only my first point still holds. Mixing broadcasting and anonymous function definition feels like it has only drawbacks compared to using -> as the barrier. (regarding feature orthogonality and feature compatibility)

so, f.(_.(_)) would resolve to x->f(y->x(y)) right? and f.(_(_)) would resolve to (x, y)->f(x(y)). So far that looks good, but it isn't compatible with broadcasting which is a showstopper IMO map(->_.+_, A, B) wouldn't be possible with that proposal. If we use broadcasting we'd have to actually give underscores a value wouldn't we?

rapus95 avatar Dec 11 '20 19:12 rapus95

So far that looks good, but it isn't compatible with broadcasting which is a showstopper IMO

@rapus95 What compatibility issues are you concerned about? You could use . to build a function, and then use . to broadcast it over an array, for example:

julia> @show (it .> 5).(1:10)
(it .> 5).(1:10) = Bool[0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
julia> @show (it[1] .+ it[2]).(1:10, 11:20)
(it[1] .+ it[2]).(1:10, 11:20) = [12, 14, 16, 18, 20, 22, 24, 26, 28, 30]

xitology avatar Dec 11 '20 19:12 xitology

In ->g(_, a, h(_), _, f) as opposed to (x,y,z)->g(x,a,h(y),z,f) I, to be honest, find the first case way more readable.

You've chosen to bind _ to different inputs over scope of the expression -- I don't find it to be very intuitive at all. Moreover, it is quite special-purpose, where the arguments on the RHS happen to exactly correspond to what is on the LHS. That's a rather specific alignment of the stars, no?

Regarding maintainability, I don't see where that would need any maintenance.

I'm talking about maintenance in the regular sense -- where someone unfamilar with the program, and perhaps even the programming language (Julia is used often by data scientists and other accidental programmers) have to look at a piece of code and make sure the logic is doing what is expected, perhaps making adjustments to comply with the realities of an ever-changing world.

If you want to use _, perhaps one could use it to represent the entire input. So, for simple lambdas, it represents the 1st argument since there is only one argument.

x -> x^2 => -> _^2

but... let's say the input is a tuple....

(x,y) -> x^2 => ->_₁^2 (or perhaps just -> ₁^2 or -> _[1]^2)...

with case of tuple input, -> _^2 would be something like... ERROR: MethodError: no method matching ^(::Tuple{...}, Int64)

Regardless, I dislike the whole idea.

clarkevans avatar Dec 11 '20 19:12 clarkevans

@xitology it wouldn't be compatible with #24990 _.+5 is where both proposals collide.

Also, I don't like that the meaning of the dot depends on subsequent code. The whole subjective beauty of my proposal was that it's enough to scan once from left to right.

mod.(_,10) is read as that it allows to use a collection as argument. I don't like that a placeholder which is meant to be used as such transforms the meaning of the surrounding code. It feels like an anti-feature somehow.

On the other hand I feel like there are 2 possible implementations for that proposal: a) syntactical. In that case, if we have to create new parsing behavior, why restrict broadcasting behavior by stealing its syntax? b) by using certain objects which automatically propagate through the extendable broadcasting machinery. In that case I don't see the conflict with the original proposal except for using an underscore. in this case we'd have to assign a proper object to the underscore which we intentionally prohibited.

rapus95 avatar Dec 11 '20 19:12 rapus95

@clarkevans

You've chosen to bind _ to different inputs over scope of the expression

yes, that was the exact purpose. Don't think of _ as a variable (since you can use variables to reference certain objects, which we disallowed for the underscore) but as a slot. almost physically. It's meant to be read as slots into which you drop the arguments from left to right. It's like having a stack of objects and a group of people, you can't give everyone the same object but you assign objects to them in the order in which you take them off your argument list.

Moreover, it is quite special-purpose, where the arguments on the RHS happen to exactly correspond to what is on the LHS. That's a rather specific alignment of the stars, no?

Since I'm usually in charge of supplying the arguments to my functions myself I don't find that very special-purpose.

curried=->f(_,g,_,2)
r=curried(3,h)
return curried(8,r)

where someone unfamilar with the program, and perhaps even the programming language (Julia is used often by data scientists and other accidental programmers) have to look at a piece of code and make sure the logic is doing what is expected

that also could be used as an argument to forbid all types of reflection and metaprogramming. Heck, to not to come up with anything more complex than natural language at all. And I hope that noone relies on a language foreigner for proof reading.

I feel like many here are arguing from the point of "we must not allow operator overloading because code can become non-trivial". While I argue from the point of "some cases of lambda usage (i. e. -> usage) could get a more concise feeling if we'd deduplicate argument occurence". That's also the reason why I'm still only proposing the very basic syntax which consists of a single -> and arbitrary _ where each underscore represents its own argument in order. I'm proposing a somewhat intuitive short syntax for -> (that's also a reason for why I am not sure why one would use the broadcasting dot in place of something that already has the known and wanted meaning of constructing functions)

rapus95 avatar Dec 11 '20 20:12 rapus95

Because I've grown to quite like it, here is all of the anonymous functions I sampled from Base with the bare subscript syntax. Apologies for any mistakes.

-> β‚‚ * (₁ - ₃)
-> prepend!(β‚‚, ₁)
-> ₁.x = f(β‚‚...)
-> itr.results[₁] = itr.f(β‚‚...)
-> ₁ | ~β‚‚
-> ~₁ | β‚‚
-> ~xor(₁, β‚‚)
-> ~₁ & β‚‚
-> ₁ & ~β‚‚
-> ₁ .+ β‚‚
-> Dict{₁, β‚‚}
-> :($₁ || ($expr == $β‚‚))
-> β‚‚.status == "503"
-> isa(β‚‚, IOError)
-> IdDict{₁, β‚‚}
-> ₁:β‚‚
-> ₁:β‚‚:₃
-> first(₁)-first(β‚‚)
-> isless(₁[2], β‚‚[2])
-> lt(by(₁), by(β‚‚))
-> (β‚‚; print(₁, idx > 0 ? lpad(cst[₃], nd+1) : " "^(nd+1), " "); return "")
-> (print(rpad(string(₁) * "  ", $maxlen + 3, "─")); Base.time_print(β‚‚ * 10^9); println())
-> ₁(β‚‚)
-> wait(Threads.@spawn ₁(β‚‚))
-> f(β‚‚) ? (₁..., β‚‚) : ₁
-> WeakKeyDict{₁, β‚‚}
-> β‚‚; true
-> printer(₁, β‚‚, ₃ > 0 ? code.codelocs[₃] : typemin(Int32))
-> _findin(I[₁], ₁ < n ? (1:sz[₁]) : (1:s)
-> I[₁][_findin(I[₁], ₁ < n ? (1:sz[₁]) : (1:s))]
-> firstindex(A,₁):firstindex(A,₁)-1+@inbounds(halfsz[₁])
-> ₁ == dims[1] ? (mid:mid) : (firstindex(A,₁):lastindex(A,₁))
-> idxs[₁]==first(tailinds[₁])
-> string("args_tuple: ", ₁, ", element_val: ", ₁[1], ", task: ", tskoid()), input)
-> (batch_refs[₁[1]].x = ₁[2]), enumerate(results))
-> Symbol(₁[1]) => ₁[2]
-> !₁.from_c && ₁.func === :eval
-> convert(fieldtype(T, ₁), x[₁])
-> (₁.filename, ₁.mtime)
-> !(₁ === empty_sym || '#' in string(₁))
-> ₁ == dims ? UnitRange(1, last(r[₁]) - 1) : UnitRange(r[₁])
-> ₁ == dims ? UnitRange(2, last(r[₁])) : UnitRange(r[₁])
-> getfield(sym_in(₁, bn) ? b : a, ₁)
-> !isempty(₁) && ₁ != "."
-> iperm[perm[₁]] == ₁
-> ₁ == k ? 1 : size(A, ₁)
-> ₁ == k ? Colon() : idx[₁]
-> ₁ isa Integer ? UInt64(₁) : String(₁)
-> !isa(₁, DataType) || !(₁ <: Tuple) || !isknownlength(₁)
-> at.val[₁] isa fieldtype(t, ₁)
-> !isa(₁, SSAValue) || !(₁.id in intermediaries)
-> last_stack[₁] != stack[₁]
-> (₁ = new_nodes_info[₁]; (₁.pos, ₁.attach_after))
-> ₁ != 0 && !(₁ in bb_defs)
-> !(isa(₁, LineNumberNode) || isexpr(₁, :line))

Observations:

  • Counting the number of characters saved is left as an exercise to the reader, but seems to be significant IMHO
  • The syntax has no problem handling any of the anonymous functions. The only wrinkle is unused arguments, but -> β‚‚; true doesn't look half bad
  • I think for the most part its pretty easy to read. The exception would be the anonymous functions that use integer indexing: ₁[1] is a bit confusing.

bramtayl avatar Dec 12 '20 01:12 bramtayl

Is it the case that each subscript appears as a single character but takes four keystrokes to type? \_2<tab>

yurivish avatar Dec 12 '20 01:12 yurivish

Yes, I suppose so, but copy-paste can help quite a bit (as with any variable name I suppose)

bramtayl avatar Dec 12 '20 01:12 bramtayl

the number of characters saved should be the same for the subscript and the original proposal in all cases where both are applicable. Regarding the number of characters needed, I'd probably rather count the number that needs to be read rather than the number that needs to be typed since the former is the better metric for mental load IMO. So I also like the subscript variant as an alternative that is always applicable but I again want to note that this and the original proposal are orthogonal. As such they can coexist ☺️ (and I still favor the underscore variant because it loads argument searching off my brain) So I'd be fine with having both at the end!

rapus95 avatar Dec 12 '20 01:12 rapus95

Maybe the bare subscripts are too easy to confuse with numbers? _, _β‚‚, _₃ etc. might be nice. For the common use case of 1 argument, it's still just 1 ascii character.

-> _findin(I[_], _ < n ? (1:sz[_]) : (1:s)
-> I[_][_findin(I[_], _ < n ? (1:sz[_]) : (1:s))]
-> firstindex(A,_):firstindex(A,_)-1+@inbounds(halfsz[_])
-> _ == dims[1] ? (mid:mid) : (firstindex(A,_):lastindex(A,_))
-> idxs[_]==first(tailinds[_])
-> string("args_tuple: ", _, ", element_val: ", _[1], ", task: ", tskoid()), input)
-> (batch_refs[_[1]].x = _[2]), enumerate(results))
-> Symbol(_[1]) => _[2]
-> !_.from_c && _.func === :eval
-> convert(fieldtype(T, _), x[_])
-> (_.filename, _.mtime)
-> !(_ === empty_sym || '#' in string(_))
-> _ == dims ? UnitRange(1, last(r[_]) - 1) : UnitRange(r[_])
-> _ == dims ? UnitRange(2, last(r[_])) : UnitRange(r[_])
-> getfield(sym_in(_, bn) ? b : a, _)
-> !isempty(_) && _ != "."
-> iperm[perm[_]] == _
-> _ == k ? 1 : size(A, _)
-> _ == k ? Colon() : idx[_]
-> _ isa Integer ? UInt64(_) : String(_)
-> !isa(_, DataType) || !(_ <: Tuple) || !isknownlength(_)
-> at.val[_] isa fieldtype(t, _)
-> !isa(_, SSAValue) || !(_.id in intermediaries)
-> last_stack[_] != stack[_]
-> (_ = new_nodes_info[_]; (_.pos, _.attach_after))
-> _ != 0 && !(_ in bb_defs)
-> !(isa(_, LineNumberNode) || isexpr(_, :line))
-> _β‚‚ * (_ - _₃)
-> prepend!(_β‚‚, _)
-> _.x = f(_β‚‚...)
-> itr.results[_] = itr.f(_β‚‚...)
-> _ | ~_β‚‚
-> ~_ | _β‚‚
-> ~xor(_, _β‚‚)
-> ~_ & _β‚‚
-> _ & ~_β‚‚
-> _ .+ _β‚‚
-> Dict{_, _β‚‚}
-> :($_ || ($expr == $_β‚‚))
-> _β‚‚.status == "503"
-> isa(_β‚‚, IOError)
-> IdDict{_, _β‚‚}
-> _:_β‚‚
-> _:_β‚‚:_₃
-> first(_)-first(_β‚‚)
-> isless(_[2], _β‚‚[2])
-> lt(by(_), by(_β‚‚))
-> (_β‚‚; print(_, idx > 0 ? lpad(cst[_₃], nd+1) : " "^(nd+1), " "); return "")
-> (print(rpad(string(_) * "  ", $maxlen + 3, "─")); Base.time_print(_β‚‚ * 10^9); println())
-> _(_β‚‚)
-> wait(Threads.@spawn _(_β‚‚))
-> f(_β‚‚) ? (_..., _β‚‚) : _
-> WeakKeyDict{_, _β‚‚}
-> _β‚‚; true
-> printer(_, _β‚‚, _₃ > 0 ? code.codelocs[_₃] : typemin(Int32))

bramtayl avatar Dec 14 '20 21:12 bramtayl

Also, according to wikipedia,

Sometimes, subscripts can be used to denote arguments. For example, we can use subscripts to denote the arguments with respect to which partial derivatives are taken.

bramtayl avatar Dec 14 '20 21:12 bramtayl

Once again, don't intermix the individual underscore with a more complex notation please. I see why you still try to push the "all inidividual underscores refer to the same argument"-concept but I guessed we already explained why it doesn't scale.

Also, for my own feelings, I don't count like "", "2", "3", .... So I suppose to count, everywhere, or don't count at all. That way it feels more consistent to me. Also, that way, we have 2 different syntaxes to define independently which is nice.

But again, if I could, I'd veto intermixing single-underscore with anything that is not single-underscore. For orthogonal design reasons.

rapus95 avatar Dec 15 '20 10:12 rapus95

How about unicode circled numbers, e.g. β‘  β‘‘ β‘’ β‘£ β‘€ β‘₯ ⑦ ⑧ ⑨. They could be given a nice shortcut \o1 == β‘  .

-> printer(β‘ ,β‘‘,β‘’ > 0 ? code.codelocs[β‘’] : typemin(Int32))

clarkevans avatar Dec 15 '20 14:12 clarkevans

Oooh, I like that!

bramtayl avatar Dec 15 '20 15:12 bramtayl

I think this proposal saves so little typing, add adds little or no clarity, compared to explicit x -> ... that it doesn't seem worth it.

The only point of a new syntax here is to abbreviate (and add clarity) for anonymous functions in the common case of very short expressions. After all of this discussion, and repeatedly finding myself wanting an underscore syntax in real-world cases, I keep circling back to the conclusion that Scala's rule is best β€” consume a single function call, and anything else can use x -> .....

stevengj avatar Dec 23 '20 17:12 stevengj

@stevengj I feel like you mixed both, albeit orthogonal, anonymous function issues. The capture scope is only relevant in the other issue while this issue is meant to focus on short lambdas with multiple arguments. There the largest savings in chars are for 2 arguments where it cuts the number of characters needed for the lambda syntax in half. I. e. 4 instead of 9 and as Stefan showed in https://github.com/JuliaLang/julia/issues/38713#issuecomment-740092621 it is at the same time capable of shifting the focus to where it belongs to. The actual action that is carried out. Anything that goes beyond a single underscore is out of scope of this proposal because it is orthogonal to it.

The only point of a new syntax here is to abbreviate (and add clarity) for anonymous functions in the common case of very short expressions.

And that's exactly what the proposal is capable of. ->mapreduce(_,+,_,init=0) imo is VERY clear and effectively reduces boilerplate to 0 because it only has the lambda indicator and a single char per argument which visually stands out. Do you see any way to reduce the amount of characters needed anymore without becoming ambiguous or less clear?

rapus95 avatar Dec 27 '20 11:12 rapus95

And that's exactly what the proposal is capable of. ->mapreduce(_,+,_,init=0) imo is VERY clear

The other PR (#24990) already supports mapreduce(_,+,_,init=0) with no -> at all, which is even more compact and clear.

The only reason to have a -> fence is to capture nested calls, but at that point the expressions are becoming complicated enough that the savings vs. x-> or (x,y)-> are not really worth a new syntax in my opinion.

stevengj avatar Dec 27 '20 13:12 stevengj

I know I mentioned this above, but maybe this is a better way to put it. Querying is not a domain-specific application, but a handy syntax style that can used for variety of problems. You can't really use tight currying for querying. That is, df |> filter(_.age > 50, _) would not transform into df |> x -> filter(y -> y.age > 50, x). On the other hand, df |> filter(-> β‘ .age > 50, β‘ ) would do the job nicely.

bramtayl avatar Dec 27 '20 21:12 bramtayl

In that example the syntax of this PR saves you at most one character over df |> filter(x-> x.age > 50, _) ...

stevengj avatar Dec 27 '20 23:12 stevengj

I'm not sure I'm following? If you compare:

df |> x -> filter(y -> y.age > 50, x) with df |> filter(-> β‘ .age > 50, β‘ ) I see four less characters (but I think more importantly, much less of trying to figure out what x and y mean)

bramtayl avatar Dec 28 '20 13:12 bramtayl

In that example the syntax of this PR saves you at most one character over df |> filter(x-> x.age > 50, _) ...

I have absolutely no idea what this could/should mean. The main problem isn't the number of characters to be typed, but code maintenance -- often by someone who didn't author the original work (and for me, all it takes is a few months, and I could swear I wasn't the implementer... till I look at the commit records and wonder what could have possibly possessed me).

clarkevans avatar Dec 28 '20 13:12 clarkevans

If you compare df |> filter(-> β‘ .age > 50, β‘ ) to df |> filter(x-> x.age > 50, _) (tight underscore currying) you are saving 1 character.

I have absolutely no idea what this could/should mean

I was referring to the Scala-like tight underscore currying in the other PR. It's a matter of taste, but it doesn't seem hard to learn that f(x,_) is an anonymous function y -> f(x,y).

(To me, filter(-> β‘ .age > 50, β‘ ) is confusing.)

stevengj avatar Dec 28 '20 14:12 stevengj

Put another way, I don't see much point in discussing a new completely general anonymous-function syntax, which is what this -> β‘ ... syntax is on the verge of becoming (it handles 99.9% of the cases where -> is used). We already have such a syntax and we're not going to get rid of it.

stevengj avatar Dec 28 '20 14:12 stevengj

(To me, filter(-> β‘ .age > 50, β‘ ) is confusing.)

I agree; and so are the other implicit options.

clarkevans avatar Dec 28 '20 14:12 clarkevans

To me, there is no point in debating anything other than an implicit option, because explicit "fenced" options (like in this PR) will inevitably overlap too much with the current -> syntax and add too little improvement.

stevengj avatar Dec 28 '20 14:12 stevengj

Hmm, well, actually, we already have two syntaxes for anonymous functions:

function (x)
    x + 1
end
x -> x + 1

Seems like not too big of a deal to add a third one:

-> β‘  + 1

Note that, as you move up the list, the syntax becomes more verbose but only slightly more powerful

bramtayl avatar Dec 28 '20 15:12 bramtayl

I was proposing a cleaner style for a certain subset of lambdas which can't be handled by the fence-less proposal of the other issue. Hence my considerations about orthogonality. sure, the other proposal has some overlap where both can be used but I don't see any lambda proposal that gets near the clarity/conciseness of ->f(2_)+_

rapus95 avatar Dec 28 '20 15:12 rapus95

->f(2_)+_

What would this mean?

clarkevans avatar Dec 28 '20 17:12 clarkevans

-> --| ---|
     V    V
->f(2_) + _

as such (x, y) -> f(2x) + y

rapus95 avatar Dec 28 '20 17:12 rapus95

Seems like not too big of a deal to add a third one:

-> β‘  + 1

I have to agree with @stevengj here. This only saves one character over the current syntax, is more of a pain to type and IMHO also harder to read. I think it's also generally a good idea to try to stick to ASCII characters for language features, since Unicode is not always well supported in all editing environments.

simeonschaub avatar Dec 28 '20 18:12 simeonschaub

even more important: circled numbers, once again, are an orthogonal issue. Can we please stick to the individual underscore variant which was proposed? Because increasing the amount of different approaches to discuss in a single issue never did anything else but stalling the actual proposal... Also, I don't want to complicate the proposal since if we make it a little more complicated it isn't easier or clearer than an ordinary lambda anymore. Also, please stop focusing your "calculations" on single argument lambdas because these are handled in the other proposal and thus you wouldn't use the fenced approach anyway since it wouldn't provide any benefit over the other proposal.

So please keep this focused on simple (nested) multi-argument lambdas.

rapus95 avatar Dec 28 '20 18:12 rapus95

Also, please stop focusing your "calculations" on single argument lambdas because these are handled in the other proposal

Multi-argument lambdas are handled in the other proposal (or rather, in the working PR) as well. The new thing here is nested calls, using a headless -> fence, where you are saving typing x (single-arg) (x,y,...) (multi-arg).

stevengj avatar Dec 28 '20 19:12 stevengj

Following Aaron's observation that the interpretation of multi-argument lambdas are in #24990, I've commented there instead of here. The skinny is, I think that using the Nth underscore for the Nth argument is not supported by the examples @bramtayl provided. With regard to nested lambdas, as someone who has to maintain other people's scripts, I think it's confusing. That said, it's not something I have a strong opinion.

clarkevans avatar Dec 28 '20 22:12 clarkevans

Personally, I think the "character counting" criteria for this feature is misguided. For me at least, this is not mainly about saving typing. What it is about, as I described above, is expressing an operation in a way that is syntactically focused on the operation and not on the arguments that are beside the point. Writing (x, y) -> x[y] puts the focus on x and y whereas -> _[_] puts the focus on the [ ] operation; admittedly not as much as just _[_] but the problems with automatically deciding how much expression to consume seems hard after all the discussion in #24990 (although I thought we had gotten to a pretty good rule towards the end). Note that we can introduce this explicitly delimited version and later add a rule for implicitly inserting the delimiter.

StefanKarpinski avatar Dec 29 '20 03:12 StefanKarpinski

Writing (x, y) -> x[y] puts the focus on x and y

Not for me β€” I think my eyes have learned to move automatically to the body of lambda expressions. This is also something that an editor can handle easily (gray out/visually de-emphasize the (x, y) ->).

I think that this kind of visual focus happens automatically with most notation after a bit of exposure, eg in ∫ f(x) dx most people look at the f(x) first. A lot of mathematical notation is seemingly "redundant" this way, but it serves an important purpose: clarity and readability.

Personally I prefer to write out the arguments (x, y) -> in exchange for not having to think about how the _ expansion works. While this proposal is clearest of all of the similar ones, it still involves locating and counting the _s (eg to determine arity).

tpapp avatar Dec 29 '20 08:12 tpapp

Put another way, "saving typing" is a proxy for the observation that for very short expressions, x -> and -> add visual noise that impedes clarity. Compare all(_ > 2, x) with all(y -> y > 2, x) or even all(-> _ > 2, x). Or, for the multi-arg case, compare reduce((x,y) -> f(x,y,data), a) with reduce(f(_,_,data), a). While, for anything that's not a very short expression, our current syntax is fine.

@tpapp, "locating and counting" the underscores is no longer a chore if all the underscores are required to be arguments of a single function call. (But it seems clear that the most common use cases of an implicit "headless" lambda syntax will be single-argument lambdas. See also https://github.com/JuliaLang/julia/pull/24990#issuecomment-752110486 for a survey of single-call lambdas in Base.)

stevengj avatar Dec 29 '20 15:12 stevengj

Regarding the readability by focusing on operations, I think it's not that clear cut: sometimes _ really helps to avoid meaningless names, but names can often be chosen judiciously to make code more readable. Compare this:

map(-> _[_], arrays, indices)   # looks like some Perl got copy-pasted here :)

with this:

map((A,i) -> A[i], arrays, indices)

The second one is arguably more readable or beginner-friendly.

But I think this proposal introduces serious issues in the interaction with #24990. The original comment says

And even if you are in a situation where the headless -> consumes an underscore from #24990 unintentionally, it's enough to just put 2 more characters (->) in the right place to make that underscore once again standalone.

It's nice that a fix is only 2 characters, but the main problem is about readability and reliability of behavior:

Pipe issues

-> doesn't play very nicely with pipes, as mentioned on Discourse:

1:2 .|> x->x^2 |> sum |> inv    # Result may not be what you expect

#24990 alone helps to side-step this problem:

1:2 .|> _^2 |> sum |> inv    # Probably what you expect

A headless -> however has the same issue as x->x^2, and the combination with #24990 brings additional troubles:

value |>
  -> _^2 |>
  log(3, _)

This would actually mean value |> x -> (x^2 |> log (3,x)). Thankfully this one would give an error rather than the "wrong" result.

The precedence of a headless -> could be made different from that of normal -> but that would be even worse (more confusing) I think.

Parser instability

With this proposal, something like map(log(3,_), A) takes a completely different meaning when copy pasted here:

f(A) = -> _ .+ map(log(3,_), A)

I find it more readable with an explicit name for the -> lambda:

f(A) = B -> B .+ map(log(3,_), A)

(The = -> _ .+ ASCII jumble was unintended but could be used to make another point :) )

The first point I want to make here is that this proposal enables larger expressions with _ in different places, and it quickly becomes hard to tell which _ are the same value and which are not.

By contrast, with #24990 alone, the nameless placeholder always has local effect which is great for readability.

My second point: it's unsettling that a "big" expression like map(log(3,_), A) gets parsed differently when -> is inserted higher in the AST. This is what macros do! When I see a big round @ I know that code is getting rewritten. For me this is a strong reason to prefer @_ for this behavior.

knuesel avatar Dec 29 '20 16:12 knuesel

Comparison of Visual Penalties

Writing (x, y) -> x[y] puts the focus on x and y

Not for me β€” I think my eyes have learned to move automatically to the body of lambda expressions. This is also something that an editor can handle easily (gray out/visually de-emphasize the (x, y) ->).

well then, ditch the head and tell me which are the bound variables (i.e. the ones that will be filled in).

(a,h,m) -> somelargefunctionname(f, inv(a), g, h, x, b, kw=m, okw=r)

        -> somelargefunctionname(f, inv(_), g, _, x, b, kw=_, okw=r)

Now, how often did you move your eyes back to check the order of the arguments and which arguments are bound at all? For the underscore case it's enough to scan the line once. To know both. Since you know for sure that only underscores will be bound and they will be filled in the order of appearance.

I think that this kind of visual focus happens automatically with most notation after a bit of exposure, eg in ∫ f(x) dx most people look at the f(x) first. A lot of mathematical notation is seemingly "redundant" this way, but it serves an important purpose: clarity and readability.

That actually is an argument in favor of having -> as the indicator for a lambda since it will be handled intuitively in the same way as the integral sign in your example.

Personally I prefer to write out the arguments (x, y) -> in exchange for not having to think about how the _ expansion works. While this proposal is clearest of all of the similar ones, it still involves locating and counting the _s (eg to determine arity).

Underscore case: locating, determining order and counting: can be done in a single pass i.e. you don't have to go back and forth. identifier case: locating, determining order and counting: can take multiple passes because you need to look up whether an identifier name is bound or not and even if you spotted them right away you need to "unshuffle" the ordering if they don't appear in the same order as in the head. Since you can't rely on the head being in the order of occurence, you have to check it.


What to measure and what not

Put another way, "saving typing" is a proxy for the observation that for very short expressions, x -> and -> add visual noise that impedes clarity. Compare all(_ > 2, x) with all(y -> y > 2, x) or even all(-> _ > 2, x). Or, for the multi-arg case, compare reduce((x,y) -> f(x,y,data), a) with reduce(f(_,_,data), a). While, for anything that's not a very short expression, our current syntax is fine.

Sure, since #24990 started to have multiple underscores refer to multiple arguments and thus having more cases in common with this proposal these cases can't be counted against the advantage of this proposal anymore. Doing a comparison only on the cases where both are applicable and thus #24990 leads the comparison is a fairly unfair move. So lets look at the cases that are still left over since they only work with this proposal and for the shared cases one wouldn't consider using this one anyway. So let's ignore them (though, both proposals must keep the same behaviour on these shared cases). Thus, only the slighty nested cases remain here. Treat this proposal as a generalization of #24990 but still a strict subset of the ordinary lambda since we can't reuse arguments.

@tpapp, "locating and counting" the underscores is no longer a chore if all the underscores are required to be arguments of a single function call. (But it seems clear that the most common use cases of an implicit "headless" lambda syntax will be single-argument lambdas. See also #24990 (comment) for a survey of single-call lambdas in Base.)

well, locating underscores should generally not be difficult given they stand out a lot in average code lines. If that's still too hard, adding it to the highlighter will allow for them burning your eyes 😁


Regarding the readability by focusing on operations, I think it's not that clear cut: sometimes _ really helps to avoid meaningless names, but names can often be chosen judiciously to make code more readable.

Well, yes, if you want to make ugly code you can. Nothing will change for that. We already stated that a lot of cases are better handled by actual written out ordinary lambdas. The argument, I will refer to it as bad style measurement, is the same as "we must not allow operator overloading because people could go nuts with it". It's true. But we can't enforce good coding style anyway so let's focus on the opportunities that arise if we don't include bad style measurements in our decision making.

Claims about incompatibilities between #24990, piping and this proposal

But I think this proposal introduces serious issues in the interaction with #24990. The original comment says

And even if you are in a situation where the headless -> consumes an underscore from #24990 unintentionally, it's enough to just put 2 more characters (->) in the right place to make that underscore once again standalone.

It's nice that a fix is only 2 characters, but the main problem is about readability and reliability of behavior:

Pipe issues

-> doesn't play very nicely with pipes, as mentioned on Discourse:

1:2 .|> x->x^2 |> sum |> inv    # Result may not be what you expect

so your argument is that because ordinary lambdas are not playing nicely with pipes we have a design issue in a shorter lambda notation that should not behave differently (as you conclude yourself further down)? I also think we should stick to equivalent behaviour here and rather think about overhauling the pipe precedence in Julia 2.0 because that's that actual problem underlying the argument.

#24990 alone helps to side-step this problem:

1:2 .|> _^2 |> sum |> inv    # Probably what you expect

of course! because we have chosen the strict consume-1-call variant. If it'd resemble generic lambdas then it would induce the same problems. So that's not a special feature of the fence-less lambda but rather a lucky coincidence of the capturing rules. And to be fair I'm not sure if I'd consider values |> map(_<1, _) to be strictly more concise than values |> ->map(->_<1, _) or, if |> becomes special-cased as a properly scoped lambda fence, values |> map(->_<1, _). But since I said I don't want to care about bad style measurements for the quality of a feature I won't consider that to be a disadvantage. It's just, to me the last one is the most concise and reusable one because it clearly shows that there are different scopes and it allows to modify and move around the code without breaking.

A headless -> however has the same issue as x->x^2, and the combination with #24990 brings additional troubles:

value |>
  -> _^2 |>
  log(3, _)

This would actually mean value |> x -> (x^2 |> log (3,x)). Thankfully this one would give an error rather than the "wrong" result.

it would mean value |> (x,y) -> (x^2 |> log (3,y)) because different underscores become different arguments. still, it would error. but once again, that's a problem of the way how lambdas and pipes interact. I.e. this problem aswell boils down to the bad precedence interaction of the pipe operator. Having the pipe operator act as a headless lambda fence with proper scoping for unbound underscores would solve that issue for the general case (i.e. in more cases than #24990). E.g. values |> map(->_^2, _) |> map(f, _) but at the same time cost us the ability to use #24990 in pipes. So in order to not to lose the ability to use #24990 with the pipe operator syntax we could move that headless lambda meaning combined with tuple splatting into |>> which allows for really cool stuff like that:

(f, A, B) |>> (map(_, _), _) |>> + |> extrema |>> (2_+_)/3

which doesn't even have a compact written out form. But since it's a new operator that can be decided at a later point. (I.e. kind of orthogonal to this issue)

The precedence of a headless -> could be made different from that of normal -> but that would be even worse (more confusing) I think.

exactly, keep them behaving exactly the same.

Parser instability

With this proposal, something like map(log(3,_), A) takes a completely different meaning when copy pasted here:

f(A) = -> _ .+ map(log(3,_), A)

As I said earlier #24990 isn't meant to be moved around and reused since even the slightest modification can break it. And if you want to move it around blindly then you will most probably do so by copy-pasting. So the size of the snippet with an extra -> is not relevant. And blindly copying code around is known to be quite bug prone anyway. So if you want to be able to move around your code or make it a template you generally should rely on closed explicit forms like the ordinary lambda or the the proposal here.

I find it more readable with an explicit name for the -> lambda:

f(A) = B -> B .+ map(log(3,_), A)

(The = -> _ .+ ASCII jumble was unintended but could be used to make another point :) )

Again, bad example measurement. Too much ASCII jumble? go for ordinary form as you would have to if this proposal wouldn't take place.

The first point I want to make here is that this proposal enables larger expressions with _ in different places, and it quickly becomes hard to tell which _ are the same value and which are not.

Oh that one is easy! None are the same value since they all resolve to different arguments. And even then it's nothing more difficult than finding which x are which in x->map(x->x<2, x). So that again is a bad example measurement.

By contrast, with #24990 alone, the nameless placeholder always has local effect which is great for readability.

Within cluttered code you probably are right. But here's the trick: if the advantage of fenceless lambdas is so high that it would be an argument for not using a headless lambda (as designed in this proposal), then, well, don't use a headless lambda in that scope.

My second point: it's unsettling that a "big" expression like map(log(3,_), A) gets parsed differently when -> is inserted higher in the AST. This is what macros do! When I see a big round @ I know that code is getting rewritten. For me this is a strong reason to prefer @_ for this behavior.

I wouldn't call that "unsettling" but "strictly wanted behaviour". It's the same as scoping/shadowing if you suddenly introduce a local definition for a locally unbound variable somewhere before it in the AST. Result: it gets bound. If you want to avoid binding an unbound variable, then do a quick scan for an unbound occurence of that variable. If you want to avoid binding an unbound underscore, do a quick scan for an unbound underscore. Here the highlighter is also a valid way to assist spotting these by highlighting unbound underscores differently than bound ones. Also: You have be careful about copying underscores around anyway given that there are multiple packages with macros that rewrite underscores. So this one won't change much about having to be careful.

rapus95 avatar Dec 30 '20 00:12 rapus95

so your argument is that because ordinary lambdas are not playing nicely with pipes we have a design issue in a shorter lambda notation that should not behave differently (as you conclude yourself further down)?

My argument is that when #24990 can be used, it provides a welcome workaround around this problem with ordinary lambdas and pipes, but this headless proposal breaks it.

If it'd resemble generic lambdas then it would induce the same problems. So that's not a special feature of the fence-less lambda but rather a lucky coincidence of the capturing rules.

It doesn't matter: #24990 works well with pipes, but this proposal breaks it.

to be fair I'm not sure if I'd consider values |> map(_<1, _) to be strictly more concise than values |> ->map(->_<1, _) [...] to me the last one is the most concise and reusable one [...]

Is this not overselling a bit your proposal? 😊 I'd hope we can at least agree that the first one is more concise...

Having the pipe operator act as a headless lambda fence with proper scoping for unbound underscores would solve that issue for the general case (i.e. in more cases than #24990). E.g. values |> map(->_^2, _) |> map(f, _) but at the same time cost us the ability to use #24990 in pipes.

Special-casing the |> operator is an option. I'm not sure what to think of it. It has obvious advantages but it makes the |> and _ features less orthogonal, and it's one more rule to learn and to think of when trying to make sense of underscores. When it was discussed in #24990, there was some push-back asking what then of <| and ∘ and how it scales in the future with additions such as ⨟. See https://github.com/JuliaLang/julia/pull/24990#issuecomment-600268361 and https://github.com/JuliaLang/julia/pull/24990#issuecomment-600414318.

So in order to not to lose the ability to use #24990 with the pipe operator syntax we could move that headless lambda meaning combined with tuple splatting into |>> which allows for really cool stuff like that:

(f, A, B) |>> (map(_, _), _) |>> + |> extrema |>> (2_+_)/3

What I Iike most about #24990 is the simplicity and readability. For me this goes in the other direction (powerful but dense, almost opaque syntax, I'd rather support with a macro than in the core language).

As I said earlier #24990 isn't meant to be moved around and reused since even the slightest modification can break it.

It's this proposal that breaks it. On its own, #24990 can be moved around and reused without issue, a really nice property! I think it would be a shame to lose it.

if the advantage of fenceless lambdas is so high that it would be an argument for not using a headless lambda (as designed in this proposal), then, well, don't use a headless lambda in that scope.

My concern is not about writing code but reading it (maybe not my own code). With #24990 it's trivial to interpret any _. This is lost when introducing headless lambdas.

Also, of course we can always say "just don't do that" (reminds me of C++) but it's nice when the language design helps to avoid pitfalls.

It's the same as scoping/shadowing if you suddenly introduce a local definition for a locally unbound variable somewhere before it in the AST. Result: it gets bound.

Indeed, a source of pain (for beginners at least). It would be nice to avoid adding another one.

You have be careful about copying underscores around anyway given that there are multiple packages with macros that rewrite underscores.

Yes, but that's the whole point of macros: they rewrite code. And they come with this distinctive @ symbol. I think headless lambdas are a great fit for a macro. Do you have a strong objection to using @_ rather than ->?

knuesel avatar Dec 30 '20 10:12 knuesel

My argument is that when #24990 can be used, it provides a welcome workaround around this problem with ordinary lambdas and pipes, but this headless proposal breaks it.

I don't see where this proposal breaks it. If you can use #24990 to work around the problem of ordinary lambdas with pipes, then why would you use a lambda? Doesn't care if it's headless or not. You're convoluting two things here. Unexpected precedence and unintended underscore consumption. Your argument uses #24990 to workaround the unexpected precedence and then you say that this proposal breaks it. If that's true then ordinary lambdas break YOUR proposal because an ordinary lambda would break that in the same way. What you are actually trying to make a point of is again the unintended consumption of an underscore. ('ll say more about it further down)

It doesn't matter: #24990 works well with pipes, but this proposal breaks it.

Only as long as your function is almost trivial. Because in every other case #24990 isn't even applicable. So yes there are a few cases where #24990 works well with pipes but in a lot of cases it isn't even applicable. And in these cases you always will have unexpected precedence since you will have to fall back to ordinary lambdas. But with this proposal you can still often have a more concise short lambda syntax at your hand. Which btw perfectly works the same as ordinary lambdas even for multiple pipes if you don't intermix both proposals. Because behaving the same as an ordinary lambda just with shorter and, when properly used, more concise syntax is the intention of this proposal. Only changing the precedence or introducing an additional pipe operator with fixed precedence will fix that.

It's this proposal that breaks it. On its own, #24990 can be moved around and reused without issue, a really nice property! I think it would be a shame to lose it.

I only think it would be a shame to encourage mindlessly copying around code you don't understand. Call, not copy. Or understand your code. I don't get why you focus so much on the copy case. For the record: Reuse is the opposite of copying. I.e. high reusability means to not to have to copy around code but to be able to just call the functions where they already exist.

TRIGGER-WARNING Given that I don't think we should focus syntactical decisions about shorter-for-conciseness variants on developers who write their code by copy-pasting without understanding their code (it always reminds me of scratch, btw a nice tool for playing around with code w/o having to care about scopes or the AST too much) I would suggest to finally make the copy-metric a non-metric.

#24990 is ~perfect~ nice and this proposal breaks it

No. No. No. We don't break functionality per se. They just can't be used together mindlessly. In other words: You don't like this feature? fine, don't use it in your code! You won't get into any trouble. You won't even experience any differences and it will be as if this feature never existed. That's it. I also think it will be easy to decide if you use one proposal or the other line by line since almost all usecases of both proposals are one-liners. For large multi-liners neither will be used much and most certainly wouldn't be that concise anyway.

Anyway, you are trying to make the proposals mutually exclusive while they actually aren't. If they were I would have to ask you why you think any of the two proposal has more right to exist than the other and why we should exclusively take the less capable variant. But that's not the case. They are not mutually exclusive.

Here are our distinct mindsets once again: You try to restrict based on what could go wrong while I try to allow multiple syntactical approaches with different tradeoffs. In Julia we have a lot of TMTOWTDI (there's more than one way to do it). For example have a look at comprehensions <-> maps <-> iterators. If we went for "nah we only want a single way to do it", then, well, #24990 wouldn't be a thing either since we already have lambdas. But we want TMTOWTDI! We can have both features. As with all other "redundant" features that also inhabit different tradeoffs. As long as the behaviour is deterministic, which it would, thanks to scoping, there's no reason to select one exclusively here.


to be fair I'm not sure if I'd consider values |> map(_<1, _) to be strictly more concise than values |> ->map(->_<1, _) [...] to me the last one is the most concise and reusable one [...]

Is this not overselling a bit your proposal? 😊 I'd hope we can at least agree that the first one is more concise...

I don't agree given that you're intentionally using underscores in close positions, even in the same scope, to resolve to different scopes. So it's not overselling but a matter of taste. The variant with the single -> instead shows that there are different scopes by explicitly introducing a new scope and it probably won't strike anyone surprisingly that lambdas are involved πŸ˜„. That's why I find it more concise. I'd agree yours is the shortest. But not the most concise. (And since the ->binds the underscore in the first argument it should be a good thing for you since it is more robust to be copied around. But still, that's a non-metric.)

Having the pipe operator act as a headless lambda fence with proper scoping for unbound underscores would solve that issue for the general case (i.e. in more cases than #24990). E.g. values |> map(->_^2, _) |> map(f, _) but at the same time cost us the ability to use #24990 in pipes.

Special-casing the |> operator is an option. I'm not sure what to think of it. It has obvious advantages but it makes the |> and _ features less orthogonal, and it's one more rule to learn and to think of when trying to make sense of underscores. When it was discussed in #24990, there was some push-back asking what then of <| and ∘ and how it scales in the future with additions such as ⨟. See #24990 (comment) and #24990 (comment).

Yes that's why I suggested creating a new operator for it. Then it remains entirely orthogonal. Btw, #24990 is also a rule more to learn. So again, that's a matter of taste which rule is more worthy. Not an objective fact to use as an argument. If I would use it as an argument I'd say, ditch #24990 and go with this proposal and special casing of |>. Then there would be only a single extra rule about underscores, that is, lone underscores on RHS bind to the next pipe operator or headless ->. Should be in your favor since it allows more cases with still only a single rule to learn about underscores. Especially it would work with pipes flawlessly! But I don't want to ditch #24990. I want both. (Or, using some famous words, "We want it all" 😏, though, that "we" may only refer to myself since I don't know much about other people's minds)

So in order to not to lose the ability to use #24990 with the pipe operator syntax we could move that headless lambda meaning combined with tuple splatting into |>> which allows for really cool stuff like that:

(f, A, B) |>> (map(_, _), _) |>> + |> extrema |>> (2_+_)/3

What I Iike most about #24990 is the simplicity and readability. For me this goes in the other direction (powerful but dense, almost opaque syntax, I'd rather support with a macro than in the core language).

I still feel like you use "shortness", "readability", "simplicity" and "conciseness" interchangeably in which case you would have to like my proposal about |>>. Even when assuming that you don't do so, I don't see where it hits the readability given that each pipe (|) effectively marks a border of evaluation (thus reduces scan range and mental load a lot) and the number of > plainly shows if the object is inserted as a single argument (=one >) into the function provided or splatted across multiple arguments/underscores (=multiple >). And given that it's using a new operator it should be easy to not to confuse the current mental model and definitively won't break code. Once again you are free to not to use it. And it's definitively less cumbersome than wrapping something into Base.splat.

if the advantage of fenceless lambdas is so high that it would be an argument for not using a headless lambda (as designed in this proposal), then, well, don't use a headless lambda in that scope.

My concern is not about writing code but reading it (maybe not my own code). With #24990 it's trivial to interpret any _. This is lost when introducing headless lambdas.

My concern is also about reading code. Reading code that can't be handled by #24990 (meanwhile there actually are examples where this proposal would be concise while being shorter than an ordinary lambda). And just for the record, interpreting a lone underscore wouldn't be trivial anyway. It depends on LHS/RHS and it has subtle nuances: _*M -> -_*M will break the code. Same for 5*_+2 or -_-5 but not for -5-_. Doing an AST analysis (i.e. in which order the calls occur) doesn't feel that much easier to me than looking for a -> (since it requires intrinsic understanding of the precedences etc opposed to being able to find two characters).

Also, of course we can always say "just don't do that" (reminds me of C++) but it's nice when the language design helps to avoid pitfalls.

Sure, my approach to proper language design is by making arguments about possibility of concise usage. Rather than by analysing worst case. Because once again, there are cases for #24990 which are ugly and would justify not adding it since "it's better to by design avoid pitfalls". Heck, a lot of syntactical sugar we have and love probably has such cases where "avoiding pitfalls" as strategy would've not allowed them to come into existence. (It also leans a little bit towards the TMTOWTDI mindset).

It's the same as scoping/shadowing if you suddenly introduce a local definition for a locally unbound variable somewhere before it in the AST. Result: it gets bound.

Indeed, a source of pain (for beginners at least). It would be nice to avoid adding another one.

A source of pain? Again, only if you mindlessly copy around snippets or try to edit code you didn't read. In all other cases you have different scopes (for example by calling instead of copying) or know which variables are taken. copy-metric is a non-metric.

And even if you notoriously would have to combine this proposal and #24990, having the wrong arity will lead to an immediate error, thanks to Julia's strong enforcement of proper typing and thus proper combination of functions. And if you understand the code which you write (which is important) you'll easily be able to find which argument should be a function (i.e. an unbound underscore).

Yes, but that's the whole point of macros: they rewrite code. And they come with this distinctive @ symbol. I think headless lambdas are a great fit for a macro. Do you have a strong objection to using @_ rather than ->?

Well yes, I guess the same as you, I don't want to use some external syntax for a feature so close to base even though it could be made as a macro. Otherwise stop #24990 since it does AST restructuring which is kind of worse than just renaming nodes and already makes #24990 struggle since a new node is interleaved somewhere into the AST which doesn't have an underscore as it's immediate subexpression and thus manipulates distant AST expressions somewhere further up in the tree. For this proposal we only automatically infer the argument list. More precisely we add an element into the -> node while not changing the structure of the AST (i.e. the parents of all immediate or indirect subexpressions won't change) and rename a few underscore nodes (structure still doesn't change). The node, expressed by -> already is in the right place of the AST and the structure won't change.

Having said that all I again want to emphasize that I want #24990 and this proposal to coexist, each with its own distinct spot of usefulness. If for whatever reason both proposals are useful in the same place the only thing to remember is to decide for one. And if it can be fully handled by #24990 go for it. If it can't, here is the feature that can.

rapus95 avatar Jan 01 '21 20:01 rapus95

Even though many people want an anonymous function syntax that doesn't require the leading -> that's proposed here, it should be noted that these are not incompatible and the syntax without the leading -> can be considered a further abbreviation where you have some rule that tells you where to insert the ->.

StefanKarpinski avatar May 28 '21 19:05 StefanKarpinski

Which, IMO would be the better way anyway since that would make this proposal here be the generic case and the fence-less lambda just be a special case of it that could be implemented by a simple AST insertion. Also would make macros for fence-less lambdas trivial to implement sind its kinda like "one out, insert fence". Julia cumulates so much elegance/expressiveness/performance by just combining perfectly chosen abstractions/specializations. πŸ™ˆ IMO head-less lamda fits into it perfectly well.

rapus95 avatar May 29 '21 20:05 rapus95

Having opened a duplicate of this myself I seem to really want this feature. πŸ˜„

So I may sum up my current position. (Still assuming complete orthogonality to https://github.com/JuliaLang/julia/pull/24990)

Proposal A: Introduce argument-less -> as a collecting fence/barrier/boundary for unbound underscores

Aim:

  • capture the essence of an action (in the mathematical sense) including all partial applications and the cases, where we have nested or mixed scopes
  • providing a concise but unambiguous short-form for lambdas that has explicit scoping

Formally, this introduces -> as a unary/prefix operator variant, which (other than other unary operators) has the same precedence as the infix variant of ->.

It acts as a middle spot between implicit scoping and no syntactic sugar at all, preventing a false dilemma. Particularly useful in cases where operators are defined but accessor-functions are used (-> the outer call is not an operator, thus #24990 doesn't work) as well as scenarios where operators aren't defined at all, as in a lot of non-mathematical contexts.

Examples

->sqrt(_^2+_^2) # euclidean distance
map(->exp(_*im), 0:10) # euler formula
map(->abs(_+_), x, y) # euler formula
filter(->abs(_.offset)>4, x)
filter(->!ismissing(_.v), x) # filter all where property v is not missing
filter(->real(_^2)>0, x) # filter all where the square has positive real part
reduce(->merge(_,normalize(_)), x, init=EmptyOne()) #custom types that don't use/define operators
->(x->_+_*x+_*x^2+_*x^3) # cubic polynomial builder

# some more cases that also work with #24990 where the only benefit here is the explicit scoping for those who prefer it
->_[_] #indexing (outer scope) (also works with #24990)
->_(_) #application (outer scope) (also works with #24990)
->_(_...) #splatting (inner and outer scope) (=Splat(_)(_), might work with #24990?)
->_[1] #indexing (outer scope) (also works with #24990)
->A[_] #indexing (inner scope) (also works with #24990)
->_(5) #application (outer scope) (also works with #24990)
->f(_) #application (inner scope) (also works with #24990)
->_(B...) #splatting (outer scope) (also works with #24990)
->f(_...) #splatting (inner scope) (won't work with #24990 due to nesting)

This allows for more specific(/ugly/concise) usage. But explicit scoping ideally assists understanding what is happening nevertheless.

Benefits

  • explicit head for parsing
  • obvious relation to lambdas since it also uses the ->
  • eases and increases readability because you know a) that all following underscores will belong to the same anonymous function initiated by -> (easier than #24990) and b) know right away in which order the arguments are used and thus don't have to scan back and forth (easier than ordinary lambdas with argument lists that may use arguments in another order than passed) c) benefits reading over writing
  • non-breaking since it's currently invalid syntax:
    julia> map(->abs(_+2), 1:10)
    ERROR: syntax: invalid identifier name "->"
    

Proposal B: Introduce a new pipe-variant |>> as syntactic sugar with proper precedence and argument splatting included

It's syntactic sugar for an ordinary pipe which by expansion works as a fence for underscores in the same way as ->. And it also acts as if the following function was wrapped in a Splat (which will be introduced and exported in Julia 1.9, hopefully together with this proposal!).

Expansion Example

x |>> (map(_, _), _)
#expanding `|>>` to
x |> Splat(->(map(_, _), _))
#expanding headless `->` to
x |> Splat((a1, a2, a3)->(map(a1, a2), a3))

By expanding it in this way, this also solves the disadvantageous precedence between (ordinary) lambdas and piping for free, since it encapsulates the lambda into the Splat call which prevents -> from consuming (multiple) pipe operators.

Example for mixed usage of all approaches & how it is evaluated

(f, A, B) |>> (map(_, _), _) |> Splat(+) |> extrema |>> isodd(_+_)
              (map(f, A), B) |> Splat(+) |> extrema |>> isodd(_+_) #C=map(f, A)
                                     C+B |> extrema |>> isodd(_+_) #D=C+B
                                         extrema(D) |>> isodd(_+_) #(E,F)=extrema(D)
                                                        isodd(E+F)

Benefits

  • purely syntactic
  • probably very easy to implement
  • elegant expansion solves precedence
  • complements |>+#24990 (that proposal for concise single argument cases and this proposal for simple multi-arg cases)
  • hints the integrated Splat with the extra β€œfeeding” > (|> vs |>>)
  • non-breaking since it's currently invalid syntax:
    julia> (x,2) |>> abs(_+_)
    ERROR: syntax: ">" is not a unary operator
    

Side Notes

  • this proposal sits somewhere in between generic lambda syntax and #24990 in regard to flexibility and capability. It just has a focus on different trade-offs. Thus, don't evaluate the proposal based on the primary use cases of the other variants! Those are unrelated/orthogonal to this proposal in the same way as the other variants are. And if any certain situation is better solved with any of the other variants, USE THAT!
  • worst case readability is ALWAYS bad. Don't use as metric.
  • if number of characters is counted at all (which probably is generally not a good metric), never compare keystrokes to type but characters to read! We want readable code, not writable code! (at least in all cases where those diverge from each other)
  • extensions to this proposal (like circled numbers) are out of this proposal's scope, since this proposal in no way depends on them. Thus, open another issue for it.

rapus95 avatar Jul 19 '22 12:07 rapus95

If it weren't for 24990, I would support this proposal, but if https://github.com/JuliaLang/julia/pull/24990 merges with the "one function call and zero or more operator calls" semantic, then I see little advantage of this proposed syntax. In the listed examples of why -> is useful, only three cases are not doable with the implicit capture of #24990. The example usage of |>> ((map(f, A), B) |> Splat(+) |> extrema |>> (_+_)/2) is also more simply expressed under #24990 as (map(f, A), B) |> Splat(+) |> extrema |> (_+_)/2.

I think this is an alternative to 24990, not an orthogonal proposal. I would not like to see both merge because that would introduce too much sugar for the same thing and the marginal utility of one proposal once the other merges is very slight.

Existing (x,y,z) -> y(f(z) + x^2) syntax handles complex cases well, 24990 handles simple filter(_.value > 0, x) cases well, and I don't think there is enough middle ground to warrant the extra syntactic complexity of this proposal and the in addition to the ->less version.

The best case I can see is replacing map(x -> exp(x*im), v) with map(->exp(_*im), v), and I don't find it compelling.

LilithHafner avatar Aug 04 '22 23:08 LilithHafner

If it weren't for 24990, I would support this proposal, but if #24990 merges with the "one function call and zero or more operator calls" semantic, then I see little advantage of this proposed syntax. In the listed examples of why -> is useful, only three cases are not doable with the implicit capture of #24990. The example usage of |>> ((map(f, A), B) |> Splat(+) |> extrema |>> (_+_)/2) is also more simply expressed under #24990 as (map(f, A), B) |> Splat(+) |> extrema |> (_+_)/2.

I'll update the examples to be more outstanding (mostly just replacing operators with functions is enough to make #24990 non-usable). For the pipe example, that's primarily for showing how it would work, not for showing that there are no other ways to write that. Also, how did you get there: (map(f, A), B)? Because the input was a tuple, so either you need to do tuple destructuring beforehand or you cannot do it that way.

I think this is an alternative to 24990, not an orthogonal proposal. I would not like to see both merge because that would introduce too much sugar for the same thing and the marginal utility of one proposal once the other merges is very slight.

Different nesting layers just don't work with the other proposal. And just because the other syntax works in some shared cases, doesn't make them automatically particularly readable. In particular, I really like it if I have access to a concise and explicit syntax which doesn't force me into making assumptions about the actual scoping in each case just because I stumbled over an underscore. In other words, what about the people who want to use the non-explicit variant only in very simple single-argument cases to reduce mental load?

Existing (x,y,z) -> y(f(z) + x^2) syntax handles complex cases well, 24990 handles simple filter(_.value > 0, x) cases well, and I don't think there is enough middle ground to warrant the extra syntactic complexity of this proposal and the in addition to the ->less version.

To me, just because both cases are applicable, that's not necessarily a middle ground. It's more like a matter of taste whether you want implicit or explicit scoping, when both do the right thing. Also, it's only that easy if the corresponding type doesn't use a comparator function and accessors. Both are considered to be good Julian style. #24990 is just heavily restricted to operators. So yeah, have a look at the side-notes. There are some things that are certainly more concise in the other syntax approaches. And that's why both proposals aren't mutually exclusive. We often have TIMTOWTDI in Julia. And they complement each other quite well. Both relate to underscores, one being implicit, the other explicit. And the explicit variant has an obvious relation to anonymous functions. So I'd consider it well-rounded.

rapus95 avatar Aug 05 '22 00:08 rapus95