elvish icon indicating copy to clipboard operation
elvish copied to clipboard

Optional arguments

Open hanche opened this issue 3 years ago • 16 comments

I'll jump straight into the technicalities of this feature suggestion:

I want to be able to specify optional arguments to a function. And by optional arguments, I don't mean named options as in &option=default, but positilonal optional arguments. I suggest the syntax ?name=default for this sort of arguments, as in this example:

fn foo [?x=a y ?z=b @w u]{ echo $x $y $z $w $u }
foo 1 # fail
foo 1 2 # a 1 b [] 2
foo 1 2 3 # 1 2 b [] 3
foo 1 2 3 4 # 1 2 3 [] 4
foo 1 2 3 4 5 # 1 2 3 [4] 5
foo 1 2 3 4 5 5 # 1 2 3 [4 5] 6

One should allow any order of optional arguments, mandatory arguments, and rest arguments, with the usual limit on at most one rest argument, plus a prohibition against optional arguments after the rest argument, if there is one.

When the function is called, the arguments are assigned values from left to right, assigning the given values to positional optional arguments so long as there are enough given values remaining to assign to the remaining mandatory arguments.

In gruesome detail: (I am leaving named options out, because they are treated just like currently.) Assume the formal arguments contain m mandatory arguments and o optional arguments, and the actual argument list contains v values.

If v<m, raise an exception. Likewise, if v>m+o and there is no rest argument, raise an exception. Otherwise, proceed as follows.

  1. If the first formal argument is a mandatory argument, assign the first given value to it. Decrement v and m, and repeat with the remaining formal arguments and values.
  2. If the first formal argument is optional and v=m, assign the default value, and repeat from 1 with the remaining formal arguments and values.
  3. If the first formal argument is optional (by necessity, m<v), assign the next given value to it, decrement v, and repeat from 1 with the remaining formal arguments and values.
  4. If the first formal argument is a rest argument, assign the list of the next v-m values to it, and then assign the remaining values to the remaining mandatory arguments (if any).

Rationale:

My most common scenario is for a function to take just one optional argument. Currently, there are two choices: Either take a rest argument, raise an error if its count is greater than 1, or use a named option instead. The first choice places an undue burden on the author of the function, then second, on the user of the function.

Another interesting scenario is to implement a function with the signature foo ?x y. Currently, this requires code like this:

fn foo [x &y]{
  if (> (count $y) 1) { fail 'too many args' }
  if (eq $y []) {
    set x y = $nil $x
  } else {
   set y = $y[0]
  }
  # and here we can finally start out doing what we really wanted to do
}

The suggested feature should nicely take care of these two, and most other such requirements that users might have. Also, it is probably not hard to implement, as shown by my (not so) “gruesome detail” section above.

hanche avatar Apr 03 '21 16:04 hanche

I am mostly onboard with this idea. For the syntax, it could just be x=foo, letting the presense of a default value denote the optionality of the argument.

However, the generalized mechanism for dealing with interlacing mandatory arguments, optional arguments and the rest argument feels overengineered - it surely can work, but it would be difficult to understand.

My proposal is just to not bother supporting a generic mechanism, but just support some common patterns. Let M denote a mandatory argument, O denote an optional argument, and R denote a rest argument, the supported patterns can be described with regular expressions:

  • Pattern 1: M*O*R? - the "classical" pattern of mandatory, then optional, then rest pattern.
  • Pattern 2: OM* - a single optional argument preceding mandatory arguments. The rest argument is forbidden. Edit: I've decided that this can be generalized to M*OM+.
  • Pattern 3: M*RM+ - a rest argument anywhere in the list. Optional argument are forbidden.

Any argument list that does not conform of one of the patterns will result in a compilation error.


On a side note, there is a possible case that positional/named dimension and the mandatory/optional dimension should be made completely orthogonal; this is what Julia does, and I vaguely remember some pretty convincing arguments for it, but can no longer remember.

Should Elvish go this route, the syntax for options can be changed so that an optional without a default value, e.g. &x, represents a mandatory named argument.

xiaq avatar Apr 03 '21 16:04 xiaq

I agree on you suggested modification to the syntax, getting rid of the question marks and (possibly) introducing mandatory named options.

I don't quite buy your arguments against the generic solution, though. First, I think the mechanism is fairly easy to explain: Assign values in order from left to right, ensuring that mandatory arguments get filled, supplying left over values to optional arguments as far as possible, any overflow goes into the rest argument. I think this might be as easy as, or even easier, to implement on the go side than just supporting a limited set of patterns. (By the way, should your “pattern 3” be M*RM*?)

I do agree that this can get complicated on the elvish side, however, but the onus is on the elvish programmer to write readable code that is also understandable, and I do not think the language should enforce artificial restrictions to ensure that worthy goal. (Real programmers can write spaghetti code in any language.) If you do supply only a few patterns, inevitably someone is going to bump into a use case that does not fit these patterns, and then they will possibly just abuse the rest variable in the same way as my example above (or worse).

hanche avatar Apr 03 '21 17:04 hanche

Right, pattern 3 should be M*RM*; I corrected it as M*RM+ though, so that there is no overlap betweeen pattern 1 and 3.

The rules are surely not hard to understand, once you have read about it. I have two counter-arguments:

  • The rule is not an obvious one that is intuitive in all cases. For example, if a function takes two optional arguments and one mandatory argument, i.e. a? b? c, and is passed two arguments foo and bar, should foo be assigned to a or b? The rule you specified says a, but one might guess that it should be assigned to b so that the assigned variables form a continuous block.

    In contrast, all of the 3 patterns I listed above are fairly intuitive. Pattern 2 and pattern 3 are both unambiguous on how arguments should be filled. Pattern 1 is theoretically ambiguous, but it just fills all arguments from left to right, which is quite intuitive. This means that even someone who has never read the rules can reasonably guess how the arguments are filled.

  • Anything outside the 3 patterns I listed above seems to be really of dubious usefulness - I struggle to come up with examples of where they're useful.

Re the 3 patterns being artificial restrictions - yes, they are, and from a theoretical point of view they make the language more complex. But they also make the language easier for humans to understand.

xiaq avatar Apr 03 '21 18:04 xiaq

Hmm, elvish is your baby, of course, so I defer to your judgment. I am happy about the parts we do agree on! And I admit that at the moment, I can't come up with a convincing use case outside the ones listed, either.

hanche avatar Apr 03 '21 19:04 hanche

When the function is called, the arguments are assigned values from left to right, assigning the given values to positional optional arguments so long as there are enough given values remaining to assign to the remaining mandatory arguments.

The "so long as there are enough..." clause was where I started feeling queasy. Consider a function with three optional and three mandatory args that are interlaced: fn foo [?a=a b ?c=c d ?e=e f]{ }. Then consider even more pathological cases. Consider invocations with the number of arguments between three and six inclusive. Also, your rule for handling a "rest" argument means that optional and "rest" arguments are incompatible. Which is subtle and thus an argument against your generalized solution.

I'm inclined to agree with @xiaq that a few patterns be recognized. In particular, it seems to me the most common pattern addressed by this proposal is the cd command which takes zero or one argument.

krader1961 avatar Apr 06 '21 05:04 krader1961

Consider a function with three optional and three mandatory args that are interlaced

I did consider that, and I suppose the programmer who writes a function with that signature should be sentenced to a year of reading other people's programs written in brainfuck. I wrote the suggestion in this way because I thought it is easy to implement, not too hard to explain, and quite general.

Also, your rule for handling a "rest" argument means that optional and "rest" arguments are incompatible.

I don't think so, but it is not worth quibbling about if it is not going to be implemented that way anyhow. But note that my prohibition agains optional arguments after a rest argument was intended to dispose of any ambiguity. If it does not seem so, remember that optional and rest arguments are prioritized from the left, so all optional arguments must be filled before the rest argument. That is not so different from the M*O*R? pattern.

Be that as it may, let me propose a compromise: Join @xiaq's three pattern into one: M*O*R?M*. That avoids the problems of the interlaced optional/mandatory arguments, and it is conceptually quite simple: It is just like today's M*R?M* pattern, except you now pull ut the first few of the “rest” arguments and give them names. This is easy to document: Optional arguments and any rest argument must occur together with no mandatory arguments in between, and the rest argument comes after any optional ones. What could be simpler? The fact that no sane programmer would write a function with three mandatory arguments, four optional ones, a rest arguments, and five more mandatory arguments is not a counterargument I buy. No sane programmer will write a function with twelve mandatory positional arguments either, but the language does not prohibit it, and rightly so.

An example, to make things clearer:

fn fun [x y=$nil @z w]{ … }

This is easy to explain: There must be at least two arguments. The first is x, the last is w, and in between there is an optional y followed by an even more optional rest argument z. Even though I can't think of a use for this pattern at the moment, I also can't come up with a good argument saying the need for it will never arise. One can avoid it easily enough by putting the w argument second, but if it is a sort of target, it will feel more natural to put it at the end.

hanche avatar Apr 06 '21 07:04 hanche

Be that as it may, let me propose a compromise: Join @xiaq's three pattern into one: M*O*R?M*. That avoids the problems of the interlaced optional/mandatory arguments, and it is conceptually quite simple: It is just like today's M*R?M* pattern, except you now pull ut the first few of the “rest” arguments and give them names. This is easy to document: Optional arguments and any rest argument must occur together with no mandatory arguments in between, and the rest argument comes after any optional ones.

This is indeed a pretty neat way to unify the 3 patterns I proposed, and reasonably intuitive.

But my concern that the rule is still be hard to guess right remains. In [x=$nil y=$nil z]{ ... } foo bar, how obvious is the fact that foo should be assigned to x instead of y? I'd say not very obvious.

What could be simpler? The fact that no sane programmer would write a function with three mandatory arguments, four optional ones, a rest arguments, and five more mandatory arguments is not a counterargument I buy. No sane programmer will write a function with twelve mandatory positional arguments either, but the language does not prohibit it, and rightly so.

There is a difference. Once you learned about functions that take 2 mandatory arguments, 3 mandatory arguments, etc., it is clear how a function that takes 12 mandatory arguments should behave, because there is no ambiguity of how to generalize the mechanism. This is not the case for the problem here: after learning that [x y=$nil z=$nil]{ ... } foo bar populates $x and $y, there are two different ways to generalize that knowledge to [x=$nil y=$nil z]{ ... } foo bar by analog:

  • This should populate $x and $z. The analog is that optional arguments are populated from right to left.
  • This should populate $y and $z. The analog is that optional arguments should form a continuous block with mandatory arguments.

The ambiguity only arises when there are multiple optional arguments, which is exactly why in my pattern 2, I'm restricting the number of optional arguments to 1. Although come to think of it, there is no need to restrict where it can appear, so it could be generalized to M*OM+, which is now very similar to pattern 3 (M*RM+).

xiaq avatar Apr 06 '21 19:04 xiaq

Hmm. Too tired now, and I'm taking off for a bit of vacation, so I will let this sit and mature for a bit.

For amusement, though, let me offer this monstrosity from the classical date(1) man page:

     date [-jnu] [[[mm]dd]HH]MM[[cc]yy][.ss]

There are optional parts here, all over the place. In what order are they populated? (I am pretty sure the [[cc]yy] part gets populated last, but only because it is the only thing that makes sense in the context. The notation does not help.)

Hey, for a moment there I played with the thought of using more square brackets to disambiguate the order in which optional arguments are to be populated. But that is just too wild, even for me.

hanche avatar Apr 06 '21 19:04 hanche

Oh wait, a couple thoughts popped into my head.

First, the problem with twelve mandatory arguments is not a difficulty of figuring out which value goes where, but simply to make sense of that many arguments, and to remember which is which. And messing it up when your count is off.

With optional arguments, I think the only place where it is justified to have more of them in a row, is if their relationship is such that it makes little sense to supply the second unless you also supply the first, and even less sense to supply the third unless you also supply the first two. If that is not the case, one should use named options instead. Fortunately, we have named options, so the language already encourages good programming practices, though it does not (and should not try to) enforce them.

hanche avatar Apr 06 '21 20:04 hanche

@hanche: Your date command example is unambiguous; albeit horrendous. It is left-associative and it seems likely the author was just showing off and over-engineering the parsing. Too, it is only used to set the system clock which is something that is almost never done and anyone using it to set just the current minute value (or any other subset of date components) needs to be taken to the woodshed and punished :-)

It also involves a single positional argument, not multiple arguments, and is therefore not a valid example of the point you're trying to make. Still, I get the point you were trying to make which is to consider a date command which had separate positional arguments for specifying the month, day, hour, minute, century, year, and seconds with only the minute argument being mandatory. Which is a O*MO* pattern, more or less. This analogy doesn't work primarily due to the literal period that introduces the seconds "argument" and thus eliminates the only source of ambiguity. Which is why arguing by analogy is always fraught. :smile:

@hanche: I find it interesting your first rationale was this:

My most common scenario is for a function to take just one optional argument.

Which is essentially the cd command pattern. I too would like to see that pattern supported without resorting to the @rest argument as the sole argument.

This is a good example of a feature that should be driven by real world use cases and only generalized when doing so can be shown to be unambiguous to a human (not the computer). Which is why I think there should be restrictions to the pattern(s) for recognizing optional positional args. Such as those proposed by @xiaq.

krader1961 avatar Apr 07 '21 05:04 krader1961

@hanche Well, my concern about allowing multiple optional arguments before mandatory arguments is strictly about the guessability of the rule, as I argued before, which I haven't seen a counter-argument on. It's not about forcing best practices.

xiaq avatar Apr 09 '21 00:04 xiaq

@xiaq Well and succinctly put. I admittedly don't have a strong counter-argument, other than what seems natural to me: Filling from left to right. I'd have thought this is the most easily guessable even for people used to RTL languages, since the programming language after all reads LTR.

There is just one example I thought of that sort-of supports my idea: It's the external seq command:

seq [first [incr]] last

When you designed the built-in range, did you even consider the seq calling convention, instead of introducing the &step named option, as you did?

But yeah, I think you have me convinced. Maybe not quite 100%, but at least half way. That is good enough.

hanche avatar Apr 09 '21 06:04 hanche

When you designed the built-in range, did you even consider the seq calling convention, instead of introducing the &step named option, as you did?

Let me steal that great example to illustrate my point ;-)

Yes, I considered seq's signature (low=1 step=1 high in proposed syntax). I found it confusing. If you consider Python's range, it does conceptually the same thing, but with a different signature (low=0 high step=1). Hence I concluded that there's no "natural" position for the step argument and it's best kept as an option. Sensible Python programmers would always pass step using its name anyway (like range(1, 100, step=2)); sadly seq doesn't have that option (pun intended).

I feel that this might be true in general when you're authoring a command that takes more than 1 optional arguments - you'll struggle to find natural positions for them, and different people will have different preferences. You're better off turning those extra optional positional arguments into keyword arguments; hence pattern 2 only allows a single O.

It's a bit less true when multiple optional arguments are all at the end, when there could sometimes be a sense of transitioning from required arguments to "somewhat optional", and then to "entirely optional". Hence pattern 1 allows O*.

Again, this is not my key concern, which remains the cost of supporting arbitrary interleaving, which is the lack of guessability of the rule. But the lack of benefit does play a minor role too.

xiaq avatar Apr 09 '21 12:04 xiaq

I don't find examples like the seq command a compelling argument for adding a a maximally flexible implementation of optional arguments. The seq command is an anti-pattern, IMHO. I believe the first version of seq was written before there was an established mechanism for CLI options (i.e., "flags" such as --incr 2). Subsequent versions simply inherited its awful API. Ask a random set of programmers what seq 1 2 3 and seq 1 3 does. Even if seq was written when there was a, more or less, standard mechanism for specifying optional value that does not excuse its awful API.

krader1961 avatar Apr 11 '21 04:04 krader1961

I don't find examples like the seq command a compelling argument for adding a a maximally flexible implementation of optional arguments.

Neither do I, really. But I thought it interesting to bring that up as an example from the Real World™️. And I am glad I did, for it resulted in the above post by @xiaq , which I found illuminating.

PS. In case it is not clear, I am concerned here with exploring the possibilities, not with “winning” an argument. I am perfectly okay with @xiaq 's three patterns, just trying to be sure the consequences of that choice are well understood.

hanche avatar Apr 11 '21 05:04 hanche

Would be nice to see this feature implemented, I just needed this and after thoroughly reading the docs realized elvish doesn't have this.

Ultra-Code avatar Feb 10 '24 14:02 Ultra-Code