patma icon indicating copy to clipboard operation
patma copied to clipboard

Sigils for stores -- should we mark lvalues?

Open viridia opened this issue 4 years ago • 7 comments

This was one of the key points brought up in the SC feedback, and I suspect it is the most difficult one.

IIRC correctly, the strongest arguments for using sigils for loads (instead of stores) are:

  • Stores are much more prevalent than loads in patterns, so marking loads instead of stores reduces overall syntactical clutter.
  • A key design tenet for pattern matching is that patterns should resemble the expressions used to construct the object that is being destructured. The motivation is to provide a mnemonic device for programmers learning to use patterns: if unsure of the syntax for a pattern, use the same syntax that you would use to construct the object or expression.
  • Similarly, there is a desire to be consistent with existing destructuring assignment syntax. Lvalue-references are not explicitly marked in statements of this type.

However, all of these arguments have weak points.

First, "clutter" is a not a value-neutral term. It generally refers to excessive punctuation that harms readability. But we have not yet rigorously established that store-sigils would harm comprehension overall. At least one SC commenter opined that it would help readability and comprehension to mark stores. While it is true that extra punctuation might be jarring to look at, that may only be an artifact of their newness and unfamiliarity.

Second, the design tenet for patterns resembling construction, while well-intentioned, comes into conflict with other Python design tenets when applied too rigorously. It relies heavily on another key idea: that patterns are their own syntactical context that has different rules than regular Python code. A bare reference to an identifier within a pattern means something different than it does within a Python statement or expression. The mental leap necessary for grasping this change of context is an easy one to make for compiler-geeks like the PEP authors, but (judging from the mailing list traffic) is not as easy for the average Python programmer.

Third, the existing destructuring syntax does not have to deal with a mix of both l-values and r-values. Since there are only l-values in destructuring patterns, the syntactical choices are much simpler.

There's also a compelling argument in favor of marking stores: a number of the special cases in the PEP go away, and the overall complexity of the PEP is reduced. We no longer have to distinguish between simple names and compound names. We no longer have to warn users away from using pattern matching as a 'switch' statement. (Well, there may be other reasons not to use it that way, but at least it will function as the user expects).

What sigil do I propose? At some point I think we have discussed every punctuation character in the 7-bit ASCII set, and then some. There are some characters that obviously can't work - any character that is already used as a unary operator, or is a paired delimiter (like parents) is obviously off the table. Some characters, like period, have strongly-established meanings that don't harmonize with the intended use here.

As a strawman, I would propose caret (^). This suggestion actually surfaced briefly in previous discussions, but was abandoned when we went down the "sigils for loads" route. The reason for selecting this character is that (a) it isn't obviously disqualified by the criteria of the previous paragraph, and (b) it doesn't have a lot of "ink". By that I mean that it has few dark pixels compare to white-pixels - this helps to mitigate the "clutter" critique mentioned earlier. A symbol like @ or $ has more ink and is more visually disruptive IMHO.

Note that this proposal does not entirely address some of the arguments raised previously - patterns are still syntactically special, they are just less special than before.

Under this scheme, a typical match statement might look like this:

    match expr:
        case BinaryOp(^op, ^left, ^right):
            result = \
                f"{format_expr(left, expr.precedence)} {op} {format_expr(right, expr.precedence+1)}"
            # Surround the result in parentheses if needed
            if precedence > expr.precedence:
                return f"({result})"
            else:
                return result
        case UnaryOp(^op, ^arg):
            return f"{op}{format_expr(arg, 0)}"
        case VarExpr(^name):
            return name
        case float() | int():
            return str(expr)
        else:
            raise ValueError(f"Invalid expression value: {repr(expr)}")

I honestly don't think that looks too terrible.

However, what we don't know at this point is whether making a change like this will affect the SC vote. We know at least one SC member was opposed to using sigils for loads, but we don't know if any SC members were in favor of it.

See previous issues on this topic: #1 #90

viridia avatar Aug 25 '20 05:08 viridia

I think almost all of the (emotional) discussions and contentions around Pep 622 boil down to the community being split very evenly into two factions

  • those who'd always liked to have switch/case semantics in the language
  • those who'd love to have F#/Scala/Haskell style pattern matching

Pressing these two uses cases into the same language construct just doesn't work without making some sacrifices. Proposed "solutions" so far (and why I think they might ultimately fail to get accepted):

  1. Focus on the pattern matching use-case only and simply strike load semantics from the Pep. This would have resulted in a first draft of Pep 622 that was much simpler with fewer exceptions. I think this ship has sailed - removing load semantics from the Pep now will not appease those that are against the Pep because they deem load vs. store too complicated (and they might only care for load semantics anyways). As @ambientnuance put it in #90

    The introduction of a core construct with one programming style front of mind feels like going against the grain of Python's flexibility.

  2. Putting sigils on load and/or store semantics. I think this won't fly because while it makes load vs. store more explicit it clutters up the syntax and is still vulnerable to the argument about difficult to spot bugs and issues explaining the intricacies of load vs. store semantics to beginners.


A bread and games proposal to Pep 622

Usually I'd say adding more features to a Pep that is already feature-packed and faces critique for its complexity is a bad idea but in this case I'm not so sure.

where foo:
    equal x: ... # load semantics 
    match int(x): ... # store semantics 
    else: ...

At the price of one additional (soft-) keyword it would give us:

  • switch/case for those who always wanted it. Making load syntax explicit by the use of a keyword is in my opinion better than people abusing the currently proposed match syntax with load for switch/case.
  • separates load and store semantics with a keyword instead of some sigil/operator that is easy to miss and (ugly to look at)
  • Provides a path for teaching beginners about pattern matching with where by teaching them equal first and match with patterns later.
  • A lot of practical code will be used for handling stuff like parsing where both literal values and patterns are need. Code would often have a neat structure with constants at the top:
    where data:
        # constants
        equal foo: ...
        equal zap: ...
    
        # patterns
        match Foo(foo): ...
        match Zap(zap): ...
    
  • Gives us a decision of where else should go
  • Is closer to "python is executable pseudo-code" than using arcane sigils, operators and underhand syntax. The way to read the statement is where x equals something/ matches something do this/that (else do default action). By reading it this way it is also clear why this is a statement (do this/that) and not an expression (like in other languages)
  • gives more info to linters for issuing warnings e.g. use of plain variables in match or warning that else is missing when there are just equal cases

Bikeshedding

  • first I had equals and matches instead of equal and match
  • maybe const is better than equal
  • how about match as top level keyword and const/check as case keywords
  • a somewhat radical idea: just omit the keyword for the load semantics:
    match data:
        foo: ... # load semantics without keyword
        case Foo(foo): ... store semantics
    

stereobutter avatar Aug 25 '20 07:08 stereobutter

@SaschaSchlemmer Please stay out of our discussion. This is now between the SC and the PEP authors, and outside interference (however well meant) is very distracting. Please just sit on your hands and watch. If you keep adding comments I may have to figure out how to ban you, or revert the repo to Private. I don't want to do either of those things, but I cannot handle too many cooks in the kitchen right now.

gvanrossum avatar Aug 25 '20 15:08 gvanrossum

A potential problem with marking lvalues is that it opens the door for allowing arbitrary expressions in patterns. If you have to do something new and special to bind a value, you could easily allow things like this (none of which bind any variables):

case (x+1, d[k], a[i+1], ",".join(a)): ...
case {f(x): x, f(y): y}: ...
case (p, q, *rest): ...

But now we'd have a problem fitting in class patterns. Is this a class pattern or just creating an object?

case int(s): ...

Is this a function call or a class pattern?

case func(a=1, b=2): ...

If we don't allow near-arbitrary expressions, basically keeping the existing proposal except requiring a ^ before a binding name, we'll definitely get pressure in the future to allow expressions -- and in the meantime the restrictions and special cases will still have to be explained to everyone learning about patterns:

  • you can use names, dotted names, literals,
  • but not function calls or subscripts or operators,
  • except | (or or), which has a special meaning,
  • and what looks like a function call really is a class pattern,
  • and list, tuple and dict literals are allowed,
  • but not set literals.

I understand that we have exactly those restrictions now, but they are currently motivated by the strong desire to have an unadorned, unqualified name be a capture pattern, and the other constructs are available to build up more complex patterns.

gvanrossum avatar Aug 25 '20 15:08 gvanrossum

Thanks, @viridia for summarising many of the issues with the load/store semantics (or lvalue/rvalue, respectively, according to some people). There are few things I would like to reply to.

It relies heavily on another key idea: that patterns are their own syntactical context that has different rules than regular Python code. A bare reference to an identifier within a pattern means something different than it does within a Python statement or expression.

These two sentences show very clearly that we did not succeed in explaining the very basic idea of patterns in the first place. The rules for patterns are not that different to the rest of Python—if we could finally move away from comparing them to expressions! I think this is highly connected to Guido mentioning:

A potential problem with marking lvalues is that it opens the door for allowing arbitrary expressions in patterns.

Let us perhaps try and briefly recapitulate where patterns are coming from. The base form of a pattern is the name as a binding target, not a literal value like 0. Python has long introduced an extension in that the target name can also be 'tuple-like' to bind several names 'concurrently'. This can be used to deconstruct a sequence, of course. Now, with pattern matching, we basically build on this together with the question: "what if we don't know the length of a sequence and our assignment could fail?"

It is then natural to ask whether we could use the idea of deconstruction on other data structures than just sequences. And it is convenient to integrate some basic comparisons into the picture, e.g. allow patterns to have literal values. It seems to me that too many readers see this is the central aspect of it all, rather than some syntactic sugar to make life easier. Anyway, perhaps the most tricky part is how to express those 'other data structures' without people mistakenly taking them for expressions, which feeds into another of @viridia's comments:

[...] the design tenet for patterns resembling construction [...]

At least for sequences, this has been true of Python for a long time, again, actually. You can write, e.g., (a, *b) = (a, *b), and sure enough the stars on the left and right hand sides complement each other to make this semantically basically a no-op. Isn't this symmetry exactly what makes it so easy to use and remember?

Our goal is, in a way, to find some syntax to bring in arbitrary classes so that, in principle, the above would generalise to C(a, b) = C(a, b), say, where C could be any class or type that defines a structure for the data. It seems that our problem with that is that so many oppose to this, because now the left hand side looks like an expression (actually a function call). But, if we look closely again, Python is absolutely symmetric even here: def C(a, b): (just try to squint the def away ;-) ). You can even say def C(a, *b):, where the star (once again) nicely complements its use in other context.

Anyway, I feel that the entire load/store or lvalue/rvalue discussion is really quite symptomatic of a more fundamental issue, i.e. the nature of patterns in the first place.

P.S. Having all that said: if introducing a sigil like ^ really makes everyone happy and magically solves all our problems, I could live with it, although most certainly without enthusiasm.

Tobias-Kohn avatar Aug 25 '20 17:08 Tobias-Kohn

The load/store problem is:

  • hard
  • there was a lot of effort invested already and nothing new has come up
  • only mentioned as a big concern by a single SC member, which is the more negative about the PEP (i.e. effort put here has less chances of affecting the outcome).

If it was up to me, I would deprioritise our focus on this issue given other things to address....

dmoisset avatar Aug 25 '20 22:08 dmoisset

In the SC-VC we ended up deciding to keep the existing approach. However, Thomas plans to write a PEP to allow ? as a throwaway target anywhere, which would (if accepted in time for 3.10) invalidate the need for treating _ as a wildcard in patterns.

gvanrossum avatar Sep 16 '20 04:09 gvanrossum

Labeled as rejected (we're not marking lvalues) and fully pepped (PEP 635 addresses this).

gvanrossum avatar Oct 20 '20 17:10 gvanrossum