flux
flux copied to clipboard
Introduce concept of positional arguments for obviously required arguments
Flux is currently specified without any positional arguments, it only has keyword arguments.
This has an unexpected consequence that argument names are part of the type signature. This means that users can't choose to name the argument whatever they want rather they must name their function implementation with the same name as the function declaration.
For example the filter function expects a fn
argument that has the signature (r) -> bool
. Therefore all users must name their filter functions using the r
name.
|> filter(fn: (r) => r._measurement == "mem")
Users cannot use different argument names.
|> filter(fn: (plant) => plant.water_level > 100)
The above is not valid and will produce an error. This is unexpected as its different from any other language. We also want to enable users to use contextual names like plant
to name the context of their data instead of using generic names like r
.
The proposed solution is to allow positional arguments that are completely independent from keyword arguments. Here is a first draft of the new function argument specification:
Functions can have both positional and keyword arguments. Positional arguments are always required. Keyword arguments may be required or optional.
When calling functions all positional arguments must be placed before any keyword arguments. Positional arguments cannot be called as keyword arguments, nor can keyword arguments be called as positional arguments.
The pipe argument is a required keyword argument that can be implicitly passed using the |>
operator.
When defining a function signature users must specify which arguments are positional and which are keyword arguments. TODO we need to determine the syntax for how this is done.
This change means we need to remove the shortcut syntax to call keyword arguments implicitly with same named variables.
The proposal also include changing some of the builtin functions to expect positional arguments to enable common places where users will expect to be able to name their argument independently.
The guideline will be to only use a positional argument when the argument is obvious to the application. For example the predicate function to filter, its obvious that an argument must be provided. Same for the map function.
Note that in cases where a keyword argument is used the names must still match as the name remains part of the type system.
FYI, in dart the syntax for declaring positional versus named arguments is that position arguments go first, and then named arguments are enclosed in brackets. Flutter (a framework written in dart) adds to dart a @required decorator.
In the following function declaration, documentList is required and position, and onChangedListener is optional and named:
Future loadDocuments(DocumentList documentList, {Function onChangedListener})
Keyword arguments may be required or optional.
I've always been comfortable with Ruby's syntax for this. Basically -- if a keyword argument is optional, it must supply a default value. Otherwise, omit the default:
def my_function(param_one: "optional", required_param:)
...
end
my_function(required_param: 3) # works
my_function(param_one: "hi") # throws an error
The pipe argument is a required keyword argument that can be implicitly passed using the |> operator.
Not sure of the total implications of the following suggestion, but since we're reopening the positional argument discussion...
I think it would be more ergonomic to adopt a convention that an argument is always piped into the first position. That is, the following two lines are equivalent:
tables |> filter(fn: (plant) => plant.water_level > 100)
filter(tables, fn: (plant) => plant.water_level > 100)
I think this meshes well with the guidance to only use a positional argument when the argument is obvious to the application
. At least to me, if a function is to participate in a pipeline, the data it is transforming should be obvious. This convention exists in Elixir, and it feels very natural. We can design functions to participate in pipeline transformations without a ton of noise:
def add_5(number) do
number + 5
end
# used like
20
|> add_5()
|> IO.println()
One could also define this as:
def add_value(number, value) do
number + value
end
# used like
20
|> add_value(5)
|> IO.println()
Elixir's syntax for keyword arguments make use of syntax sugar atop its keyword lists:
def add_value(number, value: value) do
number + value
end
# used like
20
|> add_value(value: 5)
|> IO.println()
@lukevmorris I agree, I also was thinking we could make the pipe
argument the first argument as that is its definition in most other languages.
The Flux behavior for required vs not is just like you describe. If it has a default value it is not required.
Doing both of these would mean that we can drop the special =<-
syntax since a pipe forward argument must always be first. I see that as a win.
In a function definition how would we determine if an argument is positional or keyword? Keep in mind that we want keyword to be the default, meaning a user must explicitly ask for a positional argument.
I am thinking something like using #
to mark an argument as positional.
my_func = (#param_one, kw_required, kw_optional=5) => ...
Where param_one
would be positional and the others are keyword args.
Take for example the increase
function that is defined in Flux currently as
increase = (tables=<-, columns=["_value"]) => ...
It would change to
increase = (#tables, columns=["_value"]) => ...
The filter function looks like this currently
builtin filter : (tables=<- stream, fn: (r: record) -> bool ) -> stream
We could update it to this:
builtin filter : (#tables: stream, #fn: (#r: record) -> bool ) -> stream
This new type signature would allow it to be called like this:
from(bucket:"garden")
|> range(start: -1m)
|> filter((#plant) => plant.is_alive)
The #plant
part is a bit annoying, we might be able to allow for #
to be optional in a few specific cases. Then it becomes:
from(bucket:"garden")
|> range(start: -1m)
|> filter((plant) => plant.is_alive)
I am thinking we can make #
optional in cases where its not ambiguous. Meaning the filter function is expecting a predicate function that take a function with one positional argument. So if we find a function with one keyword argument we can implicitly convert that to a single positional argument. I am not sure if that is generally applicable, but worth exploring.
Is it valuable to be able to specify positional and named arguments? Specifically will I ever want to call a function using a mix of positional and named like so:
f(a, b, c: 5, d: 10)
It seems like either a function will be called entirely with positional arguments, or entirely with named arguments. Please correct me if my assumption is incorrect, but perhaps then we don't need special syntax for specifying whether a parameter is positional or named?
When a function is defined, implicitly all arguments are ordered. The order matters when calling a function with positional arguments, however you may call a function using named arguments in which case the order doesn't matter. Functions with optional parameters must be called using named arguments.
This way one defines a function in the canonical way:
f = (a, b, c) => a + b + c
but may call it in multiple ways:
x = f(0, 1, 2)
y = f(a: 0, b: 1, c: 2)
z = f(b: 5, a: 6, c: 3)
Note in order to handle pipe arguments we could require that pipe arguments always be passed into functions using the |>
syntax:
f = (x=<-, a, b) => x + a + b
0 |> f(a: 1, b: 2)
0 |> f(1, 2)
f(x: 0, a: 1, b: 2) // type error - x is a pipe parameter; cannot pass as named arg
f(0, 1, 2) // type error - x is pipe parameter; cannot pass as positional arg
If I may potentially introduce another syntax for positional arguments, I kind of like this:
builtin filter : (tables: stream=#1, fn: (r: record=#1) -> bool = #2 ) -> stream
An alternative to #
can be $
or @
. The $1
is much more common with things like bash scripts and other scripting languages. It would indicate that that's the positional argument. Another thing this can help with is by making it explicit which numbered argument it is. The function takes however many positional arguments that the maximum number is. It would also allow a single positional argument to be used for multiple arguments and would help prevent removing a parameter and breaking the number of positional arguments. I think this could also be implemented within the parser itself so it wouldn't require any change to the AST.
Also every time I write a #
sign on github it pops up an issue list so if we're going to choose a random character I'd like @
or $
for the unrelated reason that it'll make it easier to type github issues without typing escape all the time.
My guess is that the place where it would be common to mix positional and named arguments would be specifically for optional arguments. As in the required ones would be positional and named ones for optional. Otherwise, I agree that functions would likely be all one or the other.
@nathanielc, I'm not sure that we have to ditch the shortcut syntax for named arguments and ones having the same variable name. Because positional arguments must come first and are always required, you always know which arguments are positional when calling (from a type checking perspective).
For example this should all work:
fn = (foo, bar=, asdf="jkl") => { ... }
asdf = "hello"
fn(asdf, bar: "hi")
fn(asdf, bar:"hi", asdf)
fn("hi", bar: "hi", asdf)
fn(asdf, asdf, bar: "hi")
fn("hi", asdf, bar:"hi")
The downside is that as a reader of the code, I can't tell from the function call which arguments are positional and which are named because of the shortcut. But I'm not sure that's very important since we already have positional arguments, which means that I'll have to look up the function signature to really know what's going on anyway.
I'm not sure I like the #
syntax for specifying positional. Mainly for the reason @nathanielc pointed out about the type signatures for anonymous functions passed to things like filter
and map
. In my example above I got around it be having required named arguments include the trailing =
, kind of like in @lukevmorris's example (but with =
instead of :
because we already use =
).
Overall, I think adding positional arguments would be a good thing. I'd push something in the style guide to generally never have more than 1 or 2 positional arguments (and only 2 if you're using the first as a pipe forwardable one).
Actually, back to the first arg being the pipe forward one. Does this strike anyone as weird?
fn = (tables, foo) => {}
from() |> fn(23)
// same as
fn(from(), 23)
The two different ways of calling have two different positional arguments. Now that I'm looking at it I'm thinking it's fine.
So I'm +1 on positional, +1 on having the pipe forwardable argument be the first, and wondering what the exact syntax will be on the function definition for positional vs. named. Although I'm leaning towards @lukevmorris's or what I used above.
@pauldix I'm still having trouble understanding the use case for mixing named and positional arguments.
fn("hi", bar: "hi", asdf)
seems really confusing to me. In my opinion, if it makes sense to call a function with some named arguments, then it makes sense to call it with all named arguments.
Again I could be completely off base, but if this is indeed the case (hey that rhymes), then defining function parameters to be implicitly positional, and allowing users to call functions using either all positional or all named arguments, not only solves the filter
/anonymous function issue, but it also doesn't invalidate any existing flux source as functions are defined exactly as they are now.
f = (a, b, c=1) => ...
@jlapacik that example was supposed to be a degenerate case where I mixed a positional, required named, and optional named. I assume most people won't be doing that. But mixing positional and optional named parameters would be quite useful. Take limit
as an example. The number to limit by is required and the offset is optional. So that function definition would be:
limit = (tables, n, offset=0) => {...}
// calling looks like
from() |> limit(10)
// or if you want an offset
from() |> limit(10, offset: 20)
// or if you want to explicitly pass the pipe forwardable argument
limit(from(), 10)
I imagine there are a good number of functions that would have optional parameters mixed with required ones. From is a good target:
from = (bucket, host="localhost", org=context.org, token=context.token) => {...}
// so most of the time I can call
from("mybucket") |> ...
// but if I'm pulling from another org
from("somebucket", org: "paulco", token: "my rad token") |> ...
Although I just realized I'm not sure how execution context arguments can be specified so I suppose that's something we probably need to figure out. I just pretended that there was some global object called context
in my example.
This is of course assuming that we move to having the pipe forwardable argument be the first. If that's not the case then you'd have:
limit = (n, offset=0, tables=<-) => {...}
As for it being a breaking change, if we make the move of the pipe forwardable argument to be the first, that'll break everything anyway.
I agree with @pauldix that it will be common to have both positional and named arguments. Take for example any of the transformation functions, they will all likely have a single positional argument of tables
that is used for |>
and then the rest of their args will be named and some of those will have defaults.
On the other hand allowing calls to functions to use either the positional or named syntax creates challenges with type inference and readability. Since its ambiguous whether the caller wants to pass in named or positional arguments. See this example:
foo = (f) => {
x = 0
y = 1
f(x,y)
}
What is the type of the function f
in the above example? Is it a function with two positional arguments or a function with two named arguments x
and y
?
If we remove the syntax to allow calling named arguments as positional arguments then the ambiguity is removed.
foo = (f) => {
x = 0
y = 1
f(x:x,y:y)
}
Callers of foo
now know they need to provide a function f
with two keyword arguments x
and y
.
foo = (f) => {
x = 0
y = 1
f(x,y)
}
If the above is interpreted as strictly being positional, then callers of foo
know they need to provide a function f
with two positional arguments (and can therefore name them what ever they want).
I think the style guide will recommend using positional arguments for high order functions (i.e. function passed as arguments) and otherwise encouraging 1 or 2 positional arguments at most.
Wouldn't the type of f
have to be specified? Like:
foo = (f(x=, y=)) => {
// stuff here
}
@pauldix No, Flux doesn't require that you specify the types of anything. A default value can be used to hint at what the expected type is but its not feasible in all cases. For example if the filter
function were specified in pure Flux it would look something like this:
filter = (tables, fn) => {
tables |> flatMap((r) => if fn(r) then [r] else [])
}
In the above the fn
function would be inferred to be a type of (#r) -> bool
where r
is a positional argument so it could be named anything.
In contrast if we defined filter like this (look at the call to fn
its the only change):
filter = (tables, fn) => {
tables |> flatMap((r) => if fn(r: r) then [r] else [])
}
then the fn
function would be inferred to have a type of (r) -> bool
where r
is a named argument and would need to be always named r
. Obviously we want the first definition of filter
, if we don't have a difference between positional and named arguments at call sites its not possible to infer which type of function is expected.
It doesn't make sense to use a default value for the fn
function in filter so there is no way to specify the function signature other than its call site.
I was thinking not of specifying the type down to the actual argument types, but only the number of arguments/which named ones exist. Fro example, your definition of filter
includes those things, despite not specifying the types of tables
and fn
. What would happen in a case like this?
foo = (r, merge=true) => { ... }
filter = (tables, fn) => {
tables |> flatMap((r) => if fn(r) then [r] else [])
}
from() |> filter(foo) |> ...
Would the type checker automatically resolve that despite the body of filter
not having any reference to the named parameter merge
that our other function definition has?
I'm ok with killing the shortcut syntax for named arguments in function calls if it doesn't make sense, just curious if there's a sensible way to keep it.
Also, do we have an example anywhere in the stdlib of a pure Flux function that takes an anonymous function? The only ones I could think of are all defined in Go.
@pauldix Yes, you can pass any function as a argument to another function so long as it is safe to call that function the way it is called in the body of the outer function. Since you would expect to be able to call foo(r: {...})
and leave off the merge
argument since it has a default it is safe to pass foo
into filter
as the fn
argument in this case. We even have a test case for exactly that here https://github.com/influxdata/flux/blob/master/semantic/inference_test.go#L682.
Also just thinking about readability. Wouldn't function definitions that define the arguments for passed in functions be more readable/understandable? That way the user doesn't have to inspect the function body to know what the passed function signature should look like
Also, do we have an example anywhere in the stdlib of a pure Flux function that takes an anonymous function? The only ones I could think of are all defined in Go.
We have a few https://github.com/influxdata/flux/blob/master/stdlib/universe/universe.flux#L183 and https://github.com/influxdata/flux/blob/master/stdlib/universe/universe.flux#L111 and probably some others I am not remembering.
Also just thinking about readability. Wouldn't function definitions that define the arguments for passed in functions be more readable/understandable? That way the user doesn't have to inspect the function body to know what the passed function signature should look like
Yes we are planning on adding a type expression logic to the language so people can be explicit about the types of functions. This will be optional as we do not want users to be required to understand type just to be able to write a simple query. But we do expect that library authors will be able to add in type annotations.
Also keep in mind that even if the type annotation does not exist, the LSP will know what the type is and can provide that information to a user when they are writing the code for auto-completion etc.
@pauldix @nathanielc just to clarify, I was not advocating for changing how pipe arguments are specified. I think pipe arguments should be treated specially as they are now. My concern is that the same issues surrounding closures still exist with this change.
Right now, because filter
's anonymous function is defined to take a parameter named r
, a user cannot substitute it with a function whose parameter is named anything other than r
.
f = (r) => true
g = (s) => false
filter(fn: f) // works
filter(fn: g) // doesn't
Users find this very unintuitive. However under the current proposal, if a closure is defined with positional parameters, then a function that is defined with named parameters cannot be used in its place.
e = (#r) => true
f = (#s) => true
g = (r) => false
h = (s) => false
filter(fn: e) // works
filter(fn: f) // works
filter(fn: g) // doesn't
filter(fn: h) // doesn't
My concern is that users will find the above examples just as unintuitive as they do with named arguments.
On top of that, one of the main criticisms that I have heard about flux is that it is too verbose. Under the current proposal we are adding even more verbosity to function expressions as now users must specify which arguments are positional with additional syntax.
@jlapacik OK that makes sense. I think I misunderstood you original comment here then https://github.com/influxdata/flux/issues/1997#issuecomment-552620117
You are saying that if we restrict function calls to be either all positional or all named then we can correctly unify functions definitions and allow all of the above cases to work. This is because we treat function definitions as if all arguments are positional, but then make it valid to call positional arguments using the named call syntax.
I agree and I think we can make one small adjustment to get both simple function definitions and allow for mixed positional and named arguments for call sites.
I want both because I think it will be common to mix function calls with both positional and named arguments. Specifically I expect the pattern of wanting to change a single optional argument without having to use all named arguments or specifying all positional arguments will be common This is because we make heavy use of optional arguments in the stdlib.
OK so how could this work?
The crux of the issue is that we need to make function unification work between two cases:
- Unifying function definitions with their corresponding calls
- Unifying two function definitions together when functions a passed as values to other functions
My new proposal is this:
- The function type has the following definition:
type FunctionType struct {
positional []struct {
Type MonoType
Required bool
}
named map[string]struct {
Type MonoType
Required bool
}
// the actual structure could be different and use different structures etc,
// this is just how I organized it in my head.
}
- All function definitions define a function type where all arguments are considered positional. Arguments that are optional are considered positional but are flagged as optional. The means the field
named
is always empty when creating a function type from a function definition. - Function calls create function types where the arguments are specified as positional or named based on the syntax. All arguments are marked as required.
- The pipe argument must always be the first positional argument (this simplifies function unification and function definition syntax as
=<-
is no longer needed)
When unifying function types we follow these rules:
- The intersection of positional arguments must unify. By intersection I mean the smallest list of positional arguments. So if one function type has only 1 positional argument defined while the other has 4, then only the first positional argument must unify.
- Any remaining positional arguments must unify with a corresponding named argument. If any required positional argument does not have corresponding named argument unification fails.
- All remaining required named arguments must unify with corresponding named arguments either required or optional.
- All remaining optional named arguments must unify with corresponding named arguments if present.
- There is no special logic for pipe arguments since if they exist they are the first positional argument.
This should enable the all of the following cases to work:
filter = (tables, fn) =>
tables |> flatMap((r) => if fn(r) then [r] else [])
g = (r) => r.valid
g({valid: false})
g(r: {valid:true})
h = (s) => s.has_ice
h({has_ice: false})
h(s:{has_ice: false})
i = (t, u=0) => t.x > u
i({x:1})
i({x:1},2)
i(t:{x:1},u=2)
filter(fn: g) // works
filter(fn: h) // works
filter(fn: i) // works, because only one positional arg is required
The implication of this change is that how functions are called will determine the available function implementations that can be used. For example if filter
had called the fn
as fn(r:r)
then only function definitions using the argument named r
would unify. Because of this it will be best practice to call function values using the positional syntax.
I forgot to add a discussion about what this change means for backwards compatible Flux code.
In the current Flux spec with everything being a named arg, you can always add new args to a function in any position so long as the new arg is optional (i.e. has a default).
With this change we have the same ability to add new optional args, but they must always be added last. Argument order cannot be changed ever to maintain backward compatibility. This is normal and expected in most languages and so I don't see this as a significant loss.
Also if we automatically check APIs for backwards compatibility when a package is published this is even less of an issue since we can prevent mistakes from being published.
I assume in your example you could also call from() |> filter(g)
?
I like it and I agree that it will be very common to mix positional arguments and named optional arguments.
I'm not sure I understand how positional arguments can be marked as optional, can you clarify that?
@pauldix Yes, that is correct.
I'm not sure I understand how positional arguments can be marked as optional, can you clarify that?
The order still matters so all optional args have to be defined after and required args. Since you can call functions with either positional or named args, it will be best practice to call them using named args. But by defining optional args as positional it allows you to pass an implementation of function that expects positional args.
For example:
foo = (f) => f(1, 2)
cmp = (t, u=0) => t > u
foo(f: cmp)
Does that help?
Ah, I think I get it. I was reading that positional arguments (without a name) can be optional, but they can't. But does this means position matters even for named arguments when calling? For example, can I do this or will it throw an error on type checking?
foo = (a, b=1, c="default") => {}
foo(23, c: "hello", b: 3)
Oh, also adding this
foo(23, 57, b: 2) // error because I tried to pass b twice
foo(23, 57, c: "hi") // this should be good
That is correct
Heh, wait, so the top one won't throw an error? That is
foo = (a, b=1, c="default") => {}
foo(23, c: "hello", b: 3) // this is all good
@pauldix Correct, basically we make sure each arg matches up, We first match up positional args. Then we match up named args. Then we check for any missing required args.
Awesome, I love it. SHIP IT!
@nathanielc @jlapacik I like how you worked that out. Especially
The pipe argument must always be the first positional argument (this simplifies function unification and function definition syntax as =<- is no longer needed)
This is great and would make a |> f()
only syntactic sugar for f(a)
. Very intuitive and readable.
I also definitely think that your pseudo-algorithm for function unification could definitely work.
In reading this, I was having a look at how python does it, because they allow a mixture of positional, named, optional and required: https://linux.die.net/diveintopython/html/power_of_introspection/optional_arguments.html.
This looks totally whacked until you realize that arguments are simply a dictionary. The “normal” method of calling functions without argument names is actually just a shorthand where Python matches up the values with the argument names in the order they're specified in the function declaration. And most of the time, you'll call functions the “normal” way, but you always have the additional flexibility if you need it.
It seems similar to what we are saying here 🤔
For what concerns backward-compatibility, @nathanielc yes you are right., but I love how this would not imply any change in our current Flux code (just underlining that), because every call we are doing now is just a particular case of function invocation with all-named args.
Another thing I was thinking about, is that I love how our type system would infer the type of passed functions if you used named args:
f = (a, b, fn) => a + fn(arg: b)
f(1, 3, (arg) => arg * 2) // this is ok, because `fn` has a named argument `arg`.
f(1, 3, (a) => a) // errors because `fn` does not have any `arg` argument.
^^ @jlapacik
So, if your function takes functions in, you can be very explicit on which interface you are accepting at compile time. A future work on this, could be to increase readability for users. Something like:
f = (a, b, fn) where fn: (arg, ...) => {}
Anyways, that's great, looking forward to making this addition 🍾
https://github.com/influxdata/flux/issues/1997#issuecomment-553625028 @nathanielc
How would the following example work out?
f = (arg=(x, y) => x) => arg(x: 1, y: 2)
z = f(arg: (y, x) => x - y) // If x and y are named this would result in -1 (what happens today), if they are positional the answer is 1
z2 = f(arg: (b, a) => a - b) // If the names do not match then it would clearly be positional and the answer is 1
If we force positional arguments to be explicitly specified as such we could avoid this (but just because it effectively labels positional arguments with 0,1, 2 etc) but if we try to match arguments during inference I fear we may always end up with ambiguous situations like the above.
However if we do explicitly specify positional arguments we are forced to make breaking changes on filter etc, as they would no longer be possible to call with a named argument.
Wouldn't z2 be invalid? Since the function is being called with named args won't you be forced to provide an function that uses the same names?
This still seems like a problem and I don't have an answer to it, as z
is a very confusing example.
In the original comment I had recommended that functions that take a function as an argument use positional style to call the function, this way you avoid some of this confusion. But that is just a recommendation if the type system can't be sound in all cases then we shouldn't change it.
As for making positional arguments always be called positionally (i.e. giving them labels 0, 1, 2) I don't think that is a change we want to make. The only named args behavior has worked well so far. So I would only want to introduce positional args if we can do so in a way that allows for backwards compatibility.