Support optional and named arguments
That means overloading based on arity will not be supported.
Hi! Sorry if I lack the experience to make a meaningful dent here, but if possible, could you point me to where I could look to attempt this? - im trying to make sense of flatten.hs now. If you cant provide that direction that is ok! I will keep trying to learn this in my spare time.
I took declarative programming last semester, and I was very interested by wybe - which i stumbled upon during.
Hey @Chillerbag sorry for the late reply!
At a high level, the flatten module converts function calls and other expressions into procedure calls. For instance, something like
foo(bar(baz)) = ?quux
would get flattened into something that looks like
bar(baz, ?tmp0)
foo(tmp0, ?quux)
Transformations are defined for all expressions, such as let. After this pass runs, the AST that remains should only contain statements, use-blocks, conditionals, loops, constants, and variables, though will not change the semantics of the program. Future compiler passes exploit this simpler refinement of the AST.
As for how to implement optional arguments, I'm not exactly sure. One possible implementation strategy I've come up with is as follows:
- for each "optional" parameter, treat the default value as an expression, and apply type checking, etc, to the code that generates the value of that expression
- in type checking a call with optional parameters, if the corresponding argument isn't present, just ignore that parameter; if it is present, ensure its correctly typed
- in the clause pass (AST -> LPVM) or somewhere similar, where you replace a call to some proc, precede that call with any code required to generate the default parameters, and pass those as arguments if not present -- care must be taken there to use fresh variables throughout the newly added code
In this way, LPVM stays "simple" and doesn't require you build any additional constructs in the IL. (This is good practice to maintain, as it makes the complicated future passes a fair bit simpler!) You'd also need to handle higher-order code too, but I think it's easiest to simply not allow for optional parameters in HO code, and simply use the default value when generating the trampoline.
This doesn't handle named arguments, but I think we could implement that similarly, just allow the named argument to select a different parameter in type checking. I think optional arguments alone is a big enough feature, so feel free to handle one and not the other.
If you have any questions, feel free to ask more. Happy to see where this goes!
Thanks, James, for giving a detailed reply!
One other issue that comes into this is that Wybe allows "backwards" dataflow. For example, a call like
foo(bar(?baz)) = quux
is permitted, deriving baz from quux. This might be used when foo and bar are constructors and quux is a variable. This will get translated the same way James explained during flattening, but then the type checker will reverse the order of the calls (during mode checking) to look like:
foo(?tmp, quux)
bar(?baz, tmp)
The other complication here is partial application. I actually started working on this issue, and stopped working on it because I got stuck handling partial application. As I recall, the problem was that partial application is handled earlier in the compiler than what I was doing to handle optional arguments, which prevented me from handling optional arguments because the call with optional arguments had already been interpreted as a partial application. In my experience, most of the hard problems arising while working on the compiler come down to earlier parts of the compiler interfering with what later parts need.
My work on this is in the optional_args branch, which you're welcome to branch from if you want to continue that work. You're also welcome to start over if you prefer. Or you might find it easier to start with a simpler issue, and come back to this when you're more familiar with the various phases of the compiler.
But while I'm writing about this, I'll add a few of notes for future reference:
- It's tempting to handle optional arguments by just generating inlined procedures with fewer arguments that call the procedure/function with the default arguments added on the end. That would certainly be a lot easier than what I was doing (though it would still run into conflicts with partial application). But I can't see how it can be generalised to handle named arguments, which would also be really nice to have for procs with a lot of optional arguments.
- It would also be nice to allow the default value for an optional argument to be an expression mentioning arguments appearing earlier in the argument list (e.g., the default value for argument 3 could be whatever was passed as argument 2). This could be handled by allowing the default value expression for an argument to refer to the names of arguments (including ones with defaults) appearing earlier in the argument list.
- It could be useful to use a resource in the default value for an optional argument, and then not consider that to mean that the resource is used by that proc, but consider a call to that proc that uses the default to use that resource. That could be useful, for example, to have a proc to write text have an optional argument for the output stream with a default of the "current" output. Then we could have a resource that holds the "current" output stream, and a single procedure to be used for writing to a current output stream or writing to a specified output stream. Calls to it would only be considered to use the current output stream if they actually do use it. When reporting a resource error for a call to a proc that only uses the missing resource because it's a default argument, the error message should indicate why the error happened.
- Just a thought: if named arguments are handled, a
bool-valued named argument with a default offalsecould be specified by specifying just the argument name, a bit like shell command-line flags. E.g., a call to aprintproc with a default argumentnewline=falsemight be writtenprint(x, newline), instead ofprint(x,newline=true). I can imagine this being really nice where there are lots of default named arguments. OTOH, it may be a bit confusing, so I'm not sure if it's a good idea.
Wow @pschachte @jimbxb thanks for the amazing detail! The backwards dataflow is super interesting and I totally see the logical conundrum of how the lack of args would be seen unavoidably as partial application. This seems like a very difficult problem to solve. I will poke around and see if any other languages deal with this problem and how. I think R does this in an interesting way https://www.rdocumentation.org/packages/purrr/versions/0.2.4/topics/partial. Your first 3 options @pschachte all sound very compelling. Ill play and read the code in your branch, but you may be right it could be easier to look at a different issue first. Ill keep you updated!
My thought for disambiguating was to just not allow partial application of procedures/functions with optional arguments. I don't know R, so I don't understand its approach, but the mention of an explicit environment argument makes me suspect that approach wouldn't work for Wybe. But I'm open to being convinced.
I agree that handling partial application is a tricky case. I think we should allow partial application of optionally-argumented procs, however we shouldn't consider the optional arguments for those price, and instead use those default values (inside the generated trampoline!).
def f(x:int, ?y:int, z:int = 10) { ?y += x + z }
map(f, [1, 2], ?ys)
# ys == [11, 12]
I think this is a good choice as one can always pass the optional arguments by wrapping the proc in an anonymous proc and explicitly label the arguments.
def f(x:int, ?y:int, z:int = 10) { ?y += x + z }
map({ f(@, ?@, 20) }, [1, 2], ?ys)
# ys == [21, 22]
For named arguments, I think there's a natural implementation that allows for named arguments to be passed in partial applications. In the trampoline, we'd be able to unpack the passed value as we do for closed variables (there's some subtlety here, as the arguments aren't unpacked as the leading arguments normally are)
def f(x:int, ?y:int, z:int = 10) { ?y += x + z }
map(f(z=30), [1, 2], ?ys)
# ys == [31, 32]
I don't think there's any ambiguity what this should mean, so I think it's appropriate shorthand as though the function were wrapped in an anon proc.
I think it would be nice to support default values for in-out parameters too. For instance,
def reverse(xs:list(T), !r:list(T) = []) {
for ?x in xs {
?r = [x | r]
}
}
reverse([1, 2], ?xs) # xs == [2, 1]
reverse([3, 4], !xs) # xs == [4, 3, 2, 1]
I don't think this is the best example, as I'm not sure if you'd ever want to pass a non-empty list here, but I hope this illustrates my point.
For the implementation of this, if we go by the implementation where we generate code for optional tags at the call-site then we can just pass in the default value if the in-out argument is just out. (If we do the inlined implementation (I'm not a fan), then we can generate an extra proc where this argument is an output, and have the trampoline pass in the default.) I haven't thought about HO with this case, but I imagine we would want to not allow this argument as an argument for the HO proc.
I do see the possibility of handling default values for out params as well. For instance,
def f(x:int, ?y:int = 10) {
# y is initialised to 10 here
if { x > 0 :: !y += x }
}
f(1, ?y) # y == 11
f(-1, ?y) # y == 10
If we go down the implementation route wherein we generate code at the call-site for optional args, then I don't know how we could implement this. This is because we have no way of passing in the value of 10 to the proc, as f only has one argument when it gets down to the LLVM level. We could generate an in-out parameter here, and require the caller always pass 10, but I don't know if I like that.
Yes, on reflection, I think you're right that procs with default arguments can be partially applied. Usually. The key is the type system. Eg, ?z = f(x,y) can be interpreted as either f(x,y,?z) or f(x,y,*default1*,?z) or ?z = f(x,y,@) (or versions with more defaults and more @s), with the type system used to disambiguate. In most cases, only one will make sense to the type system. When more than one of these is valid, I think they should be preferred in the order listed above. Note that overloading further complicates this, since procs with this name and multiple arities and types could be imported from multiple modules.
Currently, though, I believe that conversion from proc call to closure assignment happens during flattening, which is too early to use the type system to disambiguate. I think the right solution is for flattening not to do anything like this, and for the type checker to be prepared to interpret f(x,y,?z) as any of those things listed above, in that order. Ie, apply the first interpretation and type check, and only if that fails completely (for all overloaded procs with matching name), go on to the second interpretation, and only if that fails, go onto the third.
I think a default for an in/out parameter does make sense, using your interpretation: the default supplies a default initial value. But I don't think a default for an out-only parameter makes as much sense. An input value for an out-only value should be a value to be unified with, making the proc call semidet. I can see how what you're saying makes sense from the point of the proc being defined, but not from the perspective of the caller.
IMO, the ?z = f(x,y) should always be equivalent to ?z = f(x,y,*default*), as passing the default value explicitly should not change the semantics of the program. If we don't allow for default arguments for HO arguments, then I don't think there's any ambiguity here at all!
I think ?z = f(x, y, @) would be an illegal term, but ?z = @(f(x,y,@)) should be okay and equivalent to ?z = { f(x,y,@,?@) } (I implemented this syntax due to the ambiguity of @ in expressions like this). This would have a different type to the z from before.
That's odd! I thought we could only decide if something is a closure once we have type information. That's because the "output" for a closure can only be assigned a type once we know it cannot be a real call to the proc, and that only happens in type checking! It's been a few years since I implemented those transformations, so maybe I'm wrong. If I find some time I'll give it a deeper look.
For defaults for out parameter, I agree that it only makes sense from the callee's perspective. It can easily be emulated by assigning the output at the top of the proc, so the utility is all but nil.
Agreed, f(x,y,?z) should always be equivalent to f(x,y,*default*,?z). Unless the type of z is constrained elsewhere to be a higher order proc expecting an input argument whose type matches that of the defaulted argument. Or there's another proc named f with exactly 2 input arguments and 1 output with types matching x, y, and z. Overloading does complicate things.
Yes, I forgot about the required @() wrapper around an anonymous function/proc.
You're right, we can only determine whether something is a closure during type checking. But, unless I misunderstood the code, or am remembering wrong now, there is some code in normalisation/flattening that turns proc calls into closures in some circumstances. And when I commented that code out, several HO test cases failed. That's where I got stuck on handling optional arguments.