Revisit load vs. store
A bunch of folks on python-dev brought up that it's confusing to have case foo: mean "assign to foo" and have case .foo: to alter the meaning to "compare to the value of foo".
I think we're going to need another round of this discussion.
Yeah, this will likely be the biggest sticking-point. I'm not personally even 80% satisfied with any proposed option, which is... frustrating.
This is probably one of the most obvious cases where the two different heritages of pattern matching clash. Hence, whatever solution we come up with, half of the people won't like it because it contradicts their approach to this.
One of the major problems I see here is that we often discuss this without context in an abstract space of possibilities. A single case foo: by itself does indeed look like it should have load semantics. It seems to be even stronger with something like case number_of_dots:, say. But what about case x: or case n:? As humans, we look at everything within a given context and picture such a context ourselves, when it is not given. The parser, however, has no such context (at least in Python), but needs to figure things out locally. Once again: whatever we do, there are always cases where we feel the compiler should be smart enough to understand our intent...
We should for the sake of this discussion probably rather stick to examples like the case Point(x, y):, as it is much more realistic to use pattern matching for something like this, rather than the trivial case case point:. Just as we would not want to discuss the details of tuple unpacking on an example like a, = foo().
I think we quite agree that all dotted names must have load semantics. The rationale being that we only want to assign to local variables. Ideally, pattern matching has no side effects beyond these local variables (we cannot strictly enforce this in Python, of course) and assigning to dotted names would basically mean that we write to attributes and global variables.
Flipping the semantics of .foo and foo so that .foo has store semantics seems like a bad ideas given the previous thoughts: it would introduce an exception or counter-rule and thus make our lives harder. So, if we had to make both .foo and foo have load semantics, this would leave us with two options as far as I can see—neither of which looks very appealing to me:
-
Distinguish the load/store semantics based on something like upper/lower case, or require that all "store" names must end in an underscore (I think, e.g., Basic used to have type markers in variable names).
-
Introduce another beloved symbol like
$(or whatever) for store semantics.
Given that we are moving inside a network of interdependent rules, customs, and use cases, there are realistically much fewer feasible options than what you would naively expect.
@JelleZijlstra brought up a good argument for marking extraction variables:
So instead of:
y = 3 case Point(x, .y): ... # x is assigned to, y is looked upwe'd have
y = 3 case Point($x, y): ... # x is assigned to, y is looked upThe trouble with the current syntax is that if you forget the ".", you always get a hard-to-detect bug: your pattern unexpectedly matches and "y" suddenly has a different value. Even if you find the bug, it's hard to find out where exactly the mistake happened.
But with the proposed "$" syntax, if you forget the "$", you probably will just immediately get a NameError that will tell you exactly where your bug is. (Except of course if you happen to already have the name "x" in scope, but that's hopefully not very common, and it's already what happens if you typo a local variable name.)
(Though I think this would be an even stronger argument with "?" instead of "$". :-)
Hello Guido, Brandt and Tobias :) This PEP is the first dev topic I've commented on, so apologies if I'm stepping on any toes in my approach.
I would like to bring to attention an alternative for extraction variables that I posted in the Python-Dev mailing list, which is in the same vein as a proposal from @dmoisset (feel free to correct/elaborate on anything Daniel). Posting it here is motivated by a sense that it may have flown under the radar while the discussion comes to a head between established options (primarily '. / ? / $' as a prefix) - see quotes below from #92:
@gvanrossum
Still, the only two alternatives available seem to be the current proposal or something using
?. :-(
@brandtbucher
If Python-Dev responds positively to a simple yes/no on
?, I'd say we pull the trigger.
In short, the idea is to use the scope of a match block to define upfront variables which can be used for extraction purposes. Here are some examples that span across @dmoisset's post and my own, where 'x' and 'y' are extraction variables and the body of case statements is omitted:
match point bind c: # 'c' is a 'capture' object used to express any extraction variable as an attribute
case (c.x, c.y): ...
match point bind x, y: # 'x' and 'y' are individual extraction variables
case: ...
match point:
bind x, y
case: ...
The nomenclature/syntax is certainly open to debate. Some alternatives to kick off with:
match _ [bind] _: -> [into, in, as]
match _:
[bind] _ -> [proxy/placeholder, sub/substitute]
match(c) point:
match(x, y) point:
My post: https://mail.python.org/archives/list/[email protected]/message/FRC5ZNXQPWWA4D2SJM4TYWMN5VALD3O6/
@dmoisset's post (proposal 1.B): https://mail.python.org/archives/list/[email protected]/message/43YOZUKP3GJ66Z2V2NKSJARL4CGKISEH/ My reply: https://mail.python.org/archives/list/[email protected]/message/C6EP2L66LBJKT5RHDE6OIKG7KWM2NMLV/
Some (somewhat biased) pros and cons compared to special prefix characters...
PROS;
- Variables are not typed out any differently, similar to lambda and comprehension expressions which do a similar type of binding (avoids '$number' being unrelated to 'number'). Also consistent with all other variable usage and naming (unique names, case not enforced).
- Being explicit about the use of extraction variables upfront can 'prime' a reader for their use in the body of case statements.
- Particularly useful if people 'default' to using match/case as a switch/case construct.
- Very helpful for a first-pass understanding of a code block, with or without experience with the match/case construct.
- All special-case syntax of the match/case construct remains prose-like, which is consistent with other control flow blocks.
CONS:
- Variables are not marked for extraction at their point of use.
- Additional soft keyword (although this seems similar to the 'case' keyword).
- Verbosity (subjective, since I personally consider the repeated use of a special character verbose)
P.S. English naturally lends itself to duplicitous terminology, as seen by the many different terms already used to refer to what the PEP calls a 'Name Pattern': [extraction/capture] variable, assignment [target/cue], placeholder. I'd like to get thoughts on whether some clarity is helpful in these scenarios, or if it's better to let language free range? (I am guilty of this with the 'placeholder' term.)
I want to bring up one subtle point that I think has been overlooked: If we do define a prefix operator for named pattern variables, this does not affect the walrus operator. In other words, you can still use := to assign a value to a variable, without having to prefix that variable with ? or whatever symbol we choose.
This means that ?name is, essentially, a shortcut for name := ?. I am actually fine with this - having a succinct syntax for the (vastly) more common case beats OOWTDI in this case.
I created an experimental branch https://github.com/gvanrossum/patma/blob/expr-qmark/examples/expr.py showing what my expr.py example would look like with named variables explicitly declared with a prefix character. My reaction is:
PRO: It is very obvious which names are pattern bindings (stores) and which are loads.
PRO: It's very easy to explain the rules to a beginner.
PRO: This solves the problem of accidentally forgetting the . and overwriting an important symbol - in other words, it fails fast rather than being a potential time bomb.
CON: The extra character adds a substantial amount of visual clutter. Named pattern variables are very common.
Another variant to differentiate variable binding (store) from comparing to the values of a (global) variable that I have not seen discussed elsewhere is:
example 1
x = 'hello world'
# ...
match example:
case global x: print('how boring...') # load a global variable
case x: print(f'How ingenious: {x}') # store x
example 2
def check_example(example):
default = 'hello world'
match example:
case nonlocal default: print('how boring...') # load a variable from the outer scope
case x: print(f'How ingenious: {x}') # store x
I personally find this much clearer in intend than .x or ?x variants I've seen floating around for differentiating store from load; but that is just me. Objective advantages are
- no new magic syntax/symbol
- less visual clutter compared to
?x,x?,.x - less likely to be overlooked/omitted by accident (compared to the currently proposed
.xfrom the pep) - no collision with PEP 505's use of
?for non-aware operators - people from all kinds of programming languages already are familiar with modifier keywords and with
async defthere is somewhat of a precedent in python for using another keyword to modify a statement - keywords like
globalandnonlocalare easier to google for than symbols like?
@SaschaSchlemmer This is clean but a bit verbose, especially when the patterns are nested.
Also, in your proposal way, I wonder if we're supposed to mark it global in each occurrence of x within a case clause?
-
If not, another problem would raise: you may mark a few names
globalin different positions, this is a little bit messy. -
If so, wouldn't this be verbose?
case (global C1)(global C2(...), ...).
@thautwarm
- I intended that every occurrence that should load something must be modified with either
globalornonlocal.
match example:
case Point(global x, global x): ...
....
- I've read somewhere (I think that was in one of @gvanrossum posts on the mailing list) that one main use case for loading was comparing variables to globals similar to a switch statement in c. For these use cases I don't expect a lot of nesting; is that a valid line of thinking?
match log_level:
case global DEBUG: ....
case global WARN: ...
case global INFO: ...
I must admit I am fully in the FP camp where I will use match mostly for unpacking datastructures and stuff like that and don't see the appeal of using match as a glorified c-style switch statement. This is probably the greatest issue with the pep as a whole: What is main use case (that has the nice, clutter free syntax) loading (aka c-style switch statement) or binding to names (FP-style pattern matching)?
What is main use case (that has the nice, clutter free syntax) loading (aka c-style switch statement) or binding to names (FP-style pattern matching)?
You hit the nail on the head here. The main use case is definitely FP style structure unpacking. An early proposal didn’t even have constant value patterns. But named constants and enums are very much part of Python’s culture and we felt we had to support them. And now this has caused a dilemma.
@gvanrossum
I wonder if using prefix ! operator to indicate Store is rejected? I don't see a discussion about this at here or in the python-dev mailing list. Instead of ? or *.
Semantically I think ! is better, though it's still not very good to follow Python's traditional style?
! and ? Are about equally ugly.
They're both ugly but ! has a closer meaning for storing variables: https://github.com/gvanrossum/patma/issues/92#issuecomment-649392775
Given typical usage I still want to use unadorned names for capture variables. Maybe we can introduce some notation that allows any expression to be used as a value? What about ‘+x‘ or ‘+(x+y)‘ ?
Yes, ! are very similar, apart from one important detail. ! is often used with the meaning of not. Even in Python you find it used like this as in, e.g., !=. Hence, ! would probably trip up a lot of people.
More generally, however, I would certainly welcome an operator to evaluate expressions, rather than a store marker. As I indicated before, there are other languages that actually use + in a very similar vein, although $ is used as a general evaluation operator (mind, I am not saying $ is a nice choice, but rather advocating the idea behind it).
Using nonlocal and global as load markers seems a very bad idea, as this would actually reverse their current meaning in the new context! We place a global pseudo-statement in our function to say that the global variable is modified. And then, of course, there is the problem with being way too verbose.
@ambientnuance We actually considered and briefly discussed the idea of explicitly listing the names of either the loads or stores, as evidenced here. Actually, this variant reminds me a lot of Pascal, where you had to declare all local variables upfront and it therefore feels very backward to me. But my main concern is that it does not scale well. When patterns get larger and more complex, it gets difficult to keep track of which names have load and which have store semantics, respectively. While the compiler could certainly handle it, it becomes hard for us humans to read code when the actual meaning of a symbol can no longer be detemined by local cues. I would therefore very much prefer a solution that determines a name's semantics locally.
Rightio, thanks for your follow up @Tobias-Kohn. A good self-reminder to search through all GitHub issues, not just recent ones.
Could this be added to the ‘Rejected Ideas’ in the PEP (presumably under ‘Alternatives for constant value pattern’)? I can understand if this may be deferred, given the active discussion of some options currently in that section.
Regarding poor scaling in large match blocks (or indeed as you note, complex patterns), this is definitely the weak spot for a ‘declarative’ approach. There is certainly a strong appeal in being locally unambiguous. Nonetheless, this hadn’t weighed heavily in my mind, as I had mentally assigned the task of distinguishing store and local variables to a syntax highlighter - with unique names enforced. My personal choice would be to highlight store variables in some manner, since they are distinct from all other variables. But, I was reminded today that many people prefer to have a muted theme, and also learnt that some choose to forgo syntax modification altogether.
In any case, @aeros made a worthwhile comment in the dev mailing list that I think is relevant here. They emphasised the importance of easy readability for a special character modifier, particularly for those with any visual impairments (size being the dominant factor). They also made a good counter-argument to my syntax highlighter crutch, albeit directed at the use of smaller special characters such as '.' :
However, I don't think it should be at all necessary for people to rely on syntax highlighting to be able to clearly see something that's part of a core Python language feature.
Their full comment: https://mail.python.org/archives/list/[email protected]/thread/RFW56R7LTSC3QSNIZPNZ26FZ3ZEUCZ3C/
Thanks for your time amidst what seems to be a hot topic.
The way I would characterize the current dilemma is that explicitly marking stores has a number of compelling advantages - such as avoiding the "foot gun" problem; the major sticking point is aesthetics.
@gvanrossum
The main use case is definitely FP style structure unpacking. An early proposal didn’t even have constant value patterns. But named constants and enums are very much part of Python’s culture and we felt we had to support them. And now this has caused a dilemma. [emphasis added]
Maybe taking a step back and seeing whether there is an actual need to support constant value patterns (like .x as proposed in the PEP) is an avenue away from bikeshedding all the (less than ideal) syntactical variants of the store vs load dilemma to death:
-
if ... elif ... elseis not going away and we don't want everybody to switch (sry for the pun) to usingmatcheverywhere instead. If control flow (dependent on some variable) is what you want, use the pythonic control flow statement you have always used. See also https://github.com/gvanrossum/patma/issues/97#issuecomment-649875519 for support of this
if log_level == DEBUG:
# do this
elif log_level == WARN:
# do that
else:
# do whatever
- if one wants to use
matchwith constant values there isEnum(which one might consider a better choice than global variablesDEBUG,WARNetc. anyway) or using something like
match log_level:
case logging.DEBUG: ...
case logging.WARN: ...
case _: ...
- (Not only but especially for beginners) the subtlety of store vs load in
matchis avoided completely (regardless of what the syntax for store vs. load were). Again the message is clear:-
for control flow use
if ... elif ... else -
for FP-style destructuring of objects use
match(advanced topic, not for beginners).These are your father's
matchstatements, elegant control flow statements for a more ... civilized age. — python grandmaster in a galaxy far far away
-
- Without load semantics on variables there is no need for special syntax like
.x,x?or`x` - No bugs due to misunderstanding load vs. store or mistyping (I am looking at you
.x) for variables - might simplify the implementation on the parser side?
@SaschaSchlemmer The introduction of a core construct with one programming style front of mind feels like going against the grain of Python's flexibility. I can see myself using this tool with generic variables (not just constants) whilst doing structure checks. An outlier use-case most likely, but blending paradigms helps me free up how I iterate on code.
If the construct is intended for FP-like structure checks as a first-class use-case, then one way of making that clear might be going back to the ‘as’ or ‘case’ discussion. The former makes it much harder to confuse with switch/case usage while still coherent: “match an object as having this pattern”.
Happily marrying the two worlds seems like something worthwhile for the significant expansion in functionality. I’ve mentioned this in another thread already, but I think @brandtbucher’s update to expr.py in #105 provides a low-friction avenue to do so.
EDIT: To be clear, you do make a strong argument. The distinction between pattern matching and control flow is advantageous.
Given typical usage I still want to use unadorned names for capture variables. Maybe we can introduce some notation that allows any expression to be used as a value? What about ‘+x‘ or ‘+(x+y)‘ ?
match log_level:
case `DEBUG`: ... # load
case `WARN`: ... # load
case level: print(f'{level} is not a valid log level') # store
doesn't read to bad and leaves the unadorned form for capturing variables. I think though that arbitrary expressions (like `(x+y)`) should be explicitly forbidden.
one caveat: `WARN` does look deceptively similar to 'WARN' without any syntax highlighting (but is fine with some highlighting)
@SaschaSchlemmer that has the same problem I identified above: it's very easy to just write case DEBUG: and have a hard-to-detect bug.
@SaschaSchlemmer that has the same problem I identified above: it's very easy to just write
case DEBUG:and have a hard-to-detect bug.
That case can be made for any solution, regardless of the actual syntax. Mixing load and store semantics is maybe just not a good idea for these cases where the meaning is ambiguous. This (perceived) gain in functionality might just not be worth the potential for bugs and confusing users.
I propose that load semantics should be constrained to dotted names (like foo.bar) in the first release. This decision is also somewhat of a two-way door in that a later release could allow simple names with a new adornment syntax (like `x` or x!) if deemed a useful feature by the community.
I do think that ? is clearer in the case where you are have a store on an attribute lookup in a match case. I.E.
@dataclass
class Record:
a: int
b: int
c: int
r = Record(1, 2, 3)
match r:
case Record(b=var?):
print(var)
Without it someone might not expect a store, it would look like a keyword arg using a variable, when really it is just matching to next sub pattern.
@SaschaSchlemmer Im not sure to which part you are giving the thumbs down to, you prefer the spelling in the current proposal and implementation? I personally find it more surprising on first read, we don't normally expect an assignment to happen on the right side of an equals.
@dataclass
class Record:
a: int
b: int
c: int
r = Record(1, 2, 3)
match r:
case Record(b=var):
print(var)
Or are you saying this sort of thing should be taken out of the current proposal?
I just saw that I'm being mentioned here... I'm still thinking about the big picture on this before suggesting specific local fixes, but just to contribute to the tour that everyone seems to be doing on the ASCII character set, has anybody considered an & prefix? perhaps it caters to the wrong crowd, but at least C / C++ programmers will see a "pass an l-value" there which is sort of what we want here. It's just an idea on the table. In any case I'll try to think more on this and see if I came up with something better(I still think there might be a solution that isn't "finding the right symbol")
@natelust
Im not sure to which part you are giving the thumbs down to, you prefer the spelling in the current proposal and implementation? I personally find it more surprising on first read, we don't normally expect an assignment to happen on the right side of an equals.
I very much prefer the current spelling in the proposal (minus the .x syntax for loading variables) and believe the potential for confusion appears mostly in very short, artificial examples without context. For comparison a more complete realworld-ish example:
def area(shape):
match shape:
case Rectangle(a, b):
return a * b
case Circle(r):
return pi * r**2
case Ellipse(a=a, b=b) if all((a, b)):
return pi * a * b
case Ellipse(a=a, e=e) if all((a, e)):
return pi * a**2 * sqrt(1 - e**2)
case Ellipse(alpha=a, beta=b, gamma=c) if all((a, b, c)):
return 2 * pi / sqrt(4 * a * c - b**2 )
My thoughts on the code shown in the example:
- I expect positional arguments for object unpacking to be used substantially more often than keyword arguments (cases 1 and 2)
- positional arguments alone are not sufficient to unpack more "interesting" objects with optional arguments (like my
Ellipsein cases 3-5) - keyword assignments will often have the form
x=xbecause naming things is hard and hopefully the person who wrote the class of the object you are unpacking already did a good job with that (cases 3 and 4). - even when "renaming" arguments by binding to a variable with a different name the (optional) guard and the return statement or whatever follows in the block make the meaning pretty clear (case 5)
Addendum
we don't normally expect an assignment to happen on the right side of an equals.
In current python we'd also expect something like Circle(1) to construct an object but in the new match this is reversed to deconstruct an object
...
case Circle(1): # special case for the unit circle
return pi
So if __match__ does the opposite of __init__ for an object, the reversal in the direction of assignment in pattern matching does make some intuitive sense. The confusion imho. comes mainly from the tacked on special case of loading variables (using .x in the currently proposed pep) which I think is a non-feature that actually detracts from the simplicity of the idea that pattern matching is the opposite of object construction .
I think what you wrote is reasonable, I just happen to hold a different opinion. In you addendum, you use the example Circle(1) to match against a circle with unit radius, but that is an example of a load, even if it is just load const operation. I personally find that while a nice feature, if guards should only be used when you want to both load and check (like your example), not if you just want an extra match constraint. I would find it tedious if I had to both write match syntax and if syntax all the time. Consider the following case where someone might want a load as part of a pattern match (defined with the current syntax).
def getRadius(obj, reference_frame):
# only match objects that were created in a given reference frame
match obj:
case Point(x, y, z, reference_frame=.reference_frame):
return (x**2+y**2+z**2)**0.5
case SphericalCoord(r=r, reference_frame=.reference_frame):
# don't need theta, phi
return r
case CylindricalCoord(r=r_xy, z=z, reference_frame=.reference_frame):
# dont need theta
return (r_xy**2 + z**2)**0.5
case Vector(_):
# Vectors are independent of reference frames, no unpacking needed
return abs(obj)
case _:
raise ReferenceFrameError("Unmatched Reference Frame")
It would be so much more noise, without any actual difference, to say ..., reference_frame=rf) if rf == reference_frame: on each of the match groups. It's not much different than saying Circle(1) if instead of radius being an int it was a Radius object or a variable holding reused info, i.e. Circle(.radius_arg)
Almost all of issue #90 ("Focus") has been on this topic. As I write this there is no solution but a few options are emerging:
- Pick some local syntax to mark all stores, e.g.
%fooor<foo>; see https://github.com/gvanrossum/patma/issues/105#issuecomment-650891493 - Don't support loads at all; see https://github.com/gvanrossum/patma/issues/105#issuecomment-651013054
Worth noting that even if we do remove loads completely, that doesn't totally take away the potential for bugs: I could easily see myself refactor case 5: into case SOME_CONSTANT: and forget that that's going to be a store instead of a load.
If we loose loads, I think that mean we loose the ability to match against types (and not instances) that don't have a custom metaclass. There may be sometimes you want to include some type as well as instances in your match arms. An example of this doesn't jump to mind, so this may not be a big deal, but its worth noting as others may have ideas. I just think it would be less appealing to have the following:
if x is type:
...
else:
match x:
...
This coupled with the bug inducing default of overwriting variables that @JelleZijlstra points out (this keeps biting me) makes me hope to keep loads and mark stores, but I can see the points made for the other way.
- @JelleZijlstra That's a possibility, but I think you would not get very far with such a mistake.
- @natelust What did you mean by
if x is type? Perhaps you are thinking of some other language'sisoperator? - @thautwarm or @Tobias-Kohn You seem to be able to quickly check various other language standards. Which other languages (other than Perl :-) use a marker for stores, and which don't? What do the latter do for loads? (I'm interested in somewhat major or influential languages, from F# and Haskell to Scala and Rust, as well as Elixir and Erlang.)
@gvanrossum That was an unfortunate unconscious thing to use as a placeholder for the type of a class itself. It I probably should have used something that looked like a class name instead of the concept of a type (expanded a bit from the stub)
class SomeClass:
...
def will_this_work(x) -> bool:
if x is SomeClass:
# can use class attributes
....
result = will_this_work(SomeClass)
But like I said, I cant think of a concrete case I would want this right now, that I would not also say have control on it having a metaclass and use it in match directly. It was just an observation you cant do case .SomeClass if there is no load.
Which other languages (other than Perl :-) use a marker for stores, and which don't? What do the latter do for loads?
In all the languages referenced in your comment, they don't use a marker for stores.
For loads, things might be a little complex:
-
Erlang and Elixir support loads of regular variables, but need no marker. They distinguish loads/stores by using the scope.
Each match in Elixir will shadow variables once by default, so as to avoid shaowing existing variables, you know, pin operators
^are introduced to explicitly indicate loads. -
All of the others support loads only for Enumerations, generally in form of a capitalized name. (EDIT: An exception is that Scala can use `var` to mark
varas a load.)Specifically, Rust and F# can identify the enum value(with load ctx) depending on the scope, but a warning will be emitted when a store variable is capitalized, or a load enum is in lowercase.
Do those languages that use the case of the first letter in cases also use the case of the first letter in other contexts? This would be a first in Python: until now it's just a suggestion in PEP 8, and I don't even think style tools enforce it, since it's not easy to tell whether a global foo = 1 declares a variable or a constant.
Do those languages that use the case of the first letter in cases also use the case of the first letter in other contexts?
I'm not sure what you accurately mean by "other contexts", but in those languages, capitalized names are always in load contexts, this is mandatory except F# and Rust.
A discouraged use in F# and Rust is using capitalized name in store contexts, it works if the compiler doesn't see it in the scope.
This would be a first in Python: until now it's just a suggestion in PEP 8, and I don't even think style tools enforce it, since it's not easy to tell whether a global foo = 1 declares a variable or a constant.
Yes, and I'm not in favor of identifying stores/loads by the case.
Constant value patterns in current PEP seems traditional, as very rare of those languages allow load contexts for regular variables.
Besides just find Scala can use load contexts for any name by wrapping it with quotes: `var` , but I don't even see it in any codebase before.
I'm not sure what you accurately mean by "other contexts",
I meant outside the whole pattern matching part of the language. (E.g. IIRC there are languages that require class names to start with a capital letter.)
Perhaps Scala went too far by introducing three mechanisms to pass values (c.x, `y`, and Z). And I'd rather save backticks for some other purpose (e.g. JS tagged template strings). But I could get used to "if it contains a dot or starts with a capital letter", where we consider single names prefixed with a dot as containing a dot. It would mean that you can't capture a variable whose name starts with uppercase, but that's okay with me. (Hm, do CJK languages have a notion of uppercase/lowercase?)
I meant outside the whole pattern matching part of the language. (E.g. IIRC there are languages that require class names to start with a capital letter.)
In fact, a "class name" used in all those languages should be capitalized, this is for readability and consistency: the capitalized names are always in load context, even if it's not mandatory, still it's a naming convention.
However there will be no conflicts for using lowercase names as class names, for our class patterns.
And I'd rather save backticks for some other purpose (e.g. JS tagged template strings).
It's fair, because the use of backticks is quite rare in Scala.
It's not good-looking to appear in pattern matching, and very likely people will try their best to get rid of it.
But I could get used to "if it contains a dot or starts with a capital letter", where we consider single names prefixed with a dot as containing a dot.
Some concerns here:
I talked with some Python developers in Japan and China, about PEP 622. They feel negative about a pattern looks like .BLACK but quite okay with Color.BLACK.
I somehow doubt if we really need to support load contexts for a single name occurred in patterns. Most languages don't have this, and a qualified name Color.BLACK is sufficient to represent constant value patterns.
do CJK languages have a notion of uppercase/lowercase?
No, if any just some meme for that...
I mostly agree with @thautwarm . There is one other language missing so far. Swift seems to have load semantics by default and requires either let or var to mark a name as having store semantics.
However, in contrast to all other languages that we were talking about so far, Swift clearly uses a "switch on steroids"-approach and therefore differs from ours. Also, the idea is that new variables need to be declared, which also significantly differs from Python.
An interesting side note: in most cases the respective languages/community recommend to use guards for load and compare semantics (also known as 'equality pattern'), which effectively leads to Python's if-elif-else chains.
Concerning the backticks in Scala: their primarily intended use case is not so much for load and compare semantics in pattern matching, but rather to allow keywords to be used as identifiers. Think, for instance, of the problem that match is both a keyword for pattern matching as well as a common method name with regular expression. After making match a keyword, you can use backticks to basically write re.'match'() (I cannot use backticks here because markdown syntax, of course). We solve the same problem with soft keywords, of course.
Since Scala has to seamlessly interoperate with Java—which has a different set of keywords—this mechanism is mostly needed because of this disparity. Jython has the same issue, of course, but uses a different approach. If a keyword is part of a dotted name (in particular: prefixed by a dot), it is treated as an identifier instead, allowing you to write System.out.print even back in the days when print still was a keyword in Python. I would therefore go so far and say that there is actually some precedent in the Python universe of treating dotted names separately, as well as for using soft keywords.
@natelust
def getRadius(obj, reference_frame): # only match objects that were created in a given reference frame match obj: case Point(x, y, z, reference_frame=.reference_frame): return (x**2+y**2+z**2)**0.5 case SphericalCoord(r=r, reference_frame=.reference_frame): # don't need theta, phi return r case CylindricalCoord(r=r_xy, z=z, reference_frame=.reference_frame): # dont need theta return (r_xy**2 + z**2)**0.5 case Vector(_): # Vectors are independent of reference frames, no unpacking needed return abs(obj) case _: raise ReferenceFrameError("Unmatched Reference Frame")
This is the first compelling example for load semantics that I have seen so far where I'd actually like to be able to load a non-dotted name. I totally hadn't though about that use case of loading a variable passed into a function. I still believe `x` would be a good candidate for signaling a load (and there is precedent in scala doing the same).
def getRadius(obj, reference_frame):
# only match objects that were created in a given reference frame
match obj:
case Point(x, y, z, reference_frame=`reference_frame`):
return (x**2+y**2+z**2)**0.5
case SphericalCoord(r=r, reference_frame=`reference_frame`):
# don't need theta, phi
return r
case CylindricalCoord(r=r_xy, z=z, reference_frame=`reference_frame`):
# dont need theta
return (r_xy**2 + z**2)**0.5
case Vector(_):
# Vectors are independent of reference frames, no unpacking needed
return abs(obj)
case _:
raise ReferenceFrameError("Unmatched Reference Frame")
I must admit I find @natelust's example not all that compelling. What I do like about it is that is one of the first examples that brings up loads of 'plain variables' not only in the global context, but offers a meaningful example with local variables/parameters. From that point of view, this is an excellent example that actually provides some context to have a discussion.
Ultimately, however, I see pattern matching as something that is supposed to be as static and stable as possible. The one variable in it is the subject (the x in match x:). Ideally, everything else is fixed. In the example above, for instance, I would not want for Point or Vector, say, to change its meaning between invocations of the match. This is perhaps also emphasises by the fact that pattern cannot contain arbitrary expressions to be evaluated, but rely on things like enumerations and classes, which are supposed to be pretty stable.
So, the use of a parameter as a 'pseudo'-constant in patterns seems rather alien to me. This is, in fact, an excellent use case for guards, allowing to combine the static pattern with a dynamic test.
Quoting from @natelust's comment above (highlights from me):
I personally find that while a nice feature, if guards should only be used when you want to both load and check (like your example), not if you just want an extra match constraint.
The thing is that 'load semantics' is kind of a misnomer here. Whenver we are talking about 'load semantics', what we really mean is 'load and check', or perhaps rather 'load and compare'. Strict load semantics is what we do with classes and attributes, not with variables.
@Tobias-Kohn can you clarify what you mean about "change its meaning between invocations of the match". What I had envisioned with this example was some set of objects which could be defined with an arbitrary origin point (the reference frame). This point would be fixed from the point of view of an instance of one of these classes, but in general could be any point, and thus is not supported well by something like an enum.
This function then allows a reusable way to fetch a radius from objects supported by the match statement, but only if the objects are defined with a given origin (except a vector). If this was a language with generics, you could even template getRadius on the reference_frame argument instead (though that might be difficult unless you could know the values for the ones you really care about in your code), as it is constant from the point of view of the match statement.
You are right that this is a load and check, as any sort of matching would need to be. I guess in my mind I was thinking the difference was that it would happen in the match itself, rather than bind to a variable to be then checked with a guard. The later seems a waste if the variable is bound to a name for no other reason than the guard, as then there is another variable in local scope (though I guess that is not really that big a deal, micro-optimizations and boiler plate aside). My example could be done with guards just as well as I noted, it would just involve larger statement blocks (especially if there were multiple things to guard) and would look weirdly asymmetric if matching did allow checking against true constants.
@SaschaSchlemmer Yeah, I was only writing things out in current notation for the sake of arguing why I didnt want "load" to be taken out of the proposal. I am really not fond of the leading . as a way to denote that.
Just to make sure we are on the same page: in no way do I object your example as such! I think it is a very nice and valid example of code in what it does. The only aspect where we differ is that I think that guards are actually the correct means of implementing the access to the parameter/local variable reference_frame.
A pattern specifies a possible template for the subject we want to match. This template is first and foremost a question of structure. That is, case Vector(x, y, z): is probably supposed to succeed whenever the subject we match has the structure of a (3D) vector. Depending on the issue at hand, we might want to write something like case Vector(x, y, 0): to mean a vector that lives in the xy-plane. But in either case: the structural template that we are matching against is quite static. In a way this is similar to function overloading in other languages: if I know the types of the arguments involved, I can figure out which function/implementation will be effectively called.
After this first line of static patterns, we have guards to bring in a more dynamic nature. They allow us to see if a subject matching a certain static structure fulfills additional properties and constraints beyond the purely structural. More or less in any language other than Python, it is here that we put comparisons to variables and other changing entities.
In this separation of a fixed structure on the one hand, and the moving parts on the other hand, I would certainly count a parameter as a moving, dynamically chaning part and therefore something that belongs into the guards-part. Even if we know that the parameter can only take on boolean values or values from an enum, say. It is still something that is not fixed and I would therefore not place it inside a pattern.
Because of Python's dynamic nature, we cannot (and probably should not) strictly enforce the patterns to be purely static entities. And even though this is off the table and goes way beyond this PEP, separating static and dynamic parts would also allow optimising interpreters like PyPy or Numba to create something like a decision tree/finite machine/automaton to quickly find the correct case clause.
Let me conclude with yet another example to perhaps illustrate what I mean—without pattern matching or this PEP. Python functions can have any number of parameters. In the grammar of Python, this is basically described as something of the form (I am leaving out a lot of irrelevant details here):
function_def ::= "def" name "(" [arg_list] ")" ":" ...
arg_list = name ("," name)*
This works very well in describing the "static" aspect of the syntax for defining a function. However, it does not capture the aspect that no two parameters can share the same name. In other words, def foo(x, x): is clearly wrong. But this aspect is caught after the syntactic analysis, by additional guards checking the integrity of the code.
Circling back to pattern matching: my point is that patterns themselves are a bit like a static grammar, describing unchanging and immutable structure of data. As this is clearly not enough, we also need guards to complement this with further constraints and by bringing in perhaps some local or temporal context.
I hope this explains why I think that a function parameter does not belong into the pattern itself, but rather in the guard section.
@Tobias-Kohn That was very well written, and I think I agree with you in the in the principal but maybe not in pragmatism. If you allow some specialization of arguments case Vector(x, y, 0) it feels off to me to say but that's ok because it is static, without some other justification (code complexity, performance etc). 0 is just a variable I know at 'compile time' the code would be just the same if that was substituted for another value.
The pragmatic part of me just feels that it is a waste if you have a way to spell a comparison inline within a match, but choose to insist on using other syntax to accomplish the same thing for no other reason than it feels nice to break apart the static and dynamic parts.
I initially did hold a viewpoint similar to yours. The day before this pep was sent out, I sent out my own independently developed match proposal to the python-ideas mailing list, where I only supported extracting out parameters (this was partly due to the constraints on working on it by myself). I had figured any other processing did belong post match (mine was just a proof of concept and didn't have the concept of guards either). After working through what was in this proposal I came to enjoy the fact I could match within a case statement. I liked that I could match object type too, not just instances match var: case .str.... However, your argument does appeal to me from the aesthetics sense, and so I am less certain of how I feel than before your message, thanks for taking the time to walk me through it.
This is a very interesting discussion. The idea of using a function parameter as a value pattern undercuts one of the proposed solutions for marking values: the use of a capital letter to indicate a constant.
@natelust Interestingly enough, I think I had load semantics in my initial draft/proof-of-concept, but have come around to be more cautious about it. There are two main reasons why I prefer such a distinction between static structure and dynamic parts.
1. I often work with 'alternative' Python implementations such as, e.g., Jython, Skulpt, or MicroPython, say. This made me very aware that Python code sometimes runs on unexpected systems and has to interact with other platforms than what we might have in mind when discussing new features. Moreover, JIT compilers like Numba can have an enormous impact on whether a simulation, say, runs within reasonable time to actually be useful for scientific investigation. Having a rather strict separation between what can be statically determined and what is rather dynamic behaviour can have a huge impact on how all these 'non-standard' Python implementations compile and run the code.
This PEP is clearly not about performance or runtime efficiency. Still, I feel it might be a good idea to sometimes try and look at what might follow later on. I have a hunch that a strict separation between static and dynamic parts means that Jython could potentially compile Python pattern matching to the same optimised code as Scala does, and I am quite sure that the Numba guys might do some real magic even beyond that. Hence, from my point of view, this separation is a question of pragmatics :wink:.
2. As I have noted before somewhere else, the load and compare semantics means, in principle, that we could also have something like case (x, 'x'): to check whether the match subject is a tuple with two equal values. As the Haskell community has pointed out, however, this comes with the problem that, in general, we cannot decide for every possible tuple (x, y), whether x and y are "equal". The example given by the Haskellers is with functions—a type of objects which we cannot really compare (other than for identity).
During the discussion for this PEP, we have also briefly discussed the issue of floating point numbers (issue #16), which also often defy exact comparison in unexpected ways (see, e.g. the famous missile error).
Python has a very pragmatic approach to equality between objects that works pretty well. We could therefore rely on this notion of equality that is baked into Python already. But given that comparison for equality is problematic by its very nature, and might thus lead to quite surprising behaviour and hard-to-diagnose bugs, I feel that an approach of EBTI is well warranted in this case.
We do support literals for implicit comparison, which means that the range of permissible objects for implicit comparison is very limited. If you explicitly write 0.1 into your pattern matching statement, I expect that you know what you are doing and are aware of floating point rounding errors (or are about to find out the hard way :wink:). But having implicit comparison in general with dynamically changing and uncontrolled values makes me kind of uneasy.
Using guards does not make this issue go away, of course! But it forces us to make the comparison apparent and visible.
@Tobias-Kohn those are definitely some good points. Alternative interpreters are not something I interact with frequently and I definitely hadn't put much thought into them, so thanks expanding my thinking. I am now more convinced by your argument and would be happy to see it go the way you describe, but with one thing still holding me to the option of load, sub patterns.
I know they are not quite the same, but a sub pattern is essentially saying load a class from a name and see of you can de-structure an attribute to match, and fail if you cannot. I know a class is spelled out when you write the match statement, unlike a variable, and hence is static from that perspective (well as static as a name in python can be).
From a teaching perspective though, I know I am going to have issues when teaching students why sub patterns, and constants can be used to constrain a match, but variables need a match guard. Students are just going to see things that look similar, and trip themselves up . While that is a concern, it is probably not large enough guide a decision like this.
Before I write this next bit, I just want to make clear I am not endorsing this pattern but I am also not criticizing any future person that does something like this either. I know it is not the job of the pep to protect people from themselves, but the field I work in is quite common to see all sorts of workarounds created. I don't know if it is worth considering this in the decision making or not, but you are likely to see something like the following pop up on pypi at some point. If this veers into the realm of, "that's not something we care about when working on this pep", then forget I wrote anything here. Possibl pypi package: match_static
class MetaStatic(type):
def __new__(meta, name, bases, dct, **kwargs):
dct['value'] = kwargs['value']
return super().__new__(meta, name, bases, dct)
def __instancecheck__(cls, instance):
return instance == cls.value
def make_static(value):
class Static(value=value, metaclass=MetaStatic):
pass
return Static
Use match_static
def getRadius(obj, reference_frame):
# only match objects that were created in a given reference frame
RF = match_static.make_static(reference_frame)
match obj:
# Someone might even get a named reference to the var here instead of _
case Point(x, y, z, reference_frame=RF(_)):
return (x**2+y**2+z**2)**0.5
case SphericalCoord(r=r, reference_frame=RF(_)):
# don't need theta, phi
return r
case CylindricalCoord(r=r_xy, z=z, reference_frame=RF(_)):
# dont need theta
return (r_xy**2 + z**2)**0.5
case Vector(_):
# Vectors are independent of reference frames, no unpacking needed
return abs(obj)
case _:
raise ReferenceFrameError("Unmatched Reference Frame")
I would imagine this would be more likely to be (invented and) used if there were several variables in the guard leading to long lines. This might still not be an issue for something like a JIT compiler, as it might be able to know enough from the form of the code, I don't know enough about JITS to say. This is really more of a question of is it better for communities to define this sort of thing, or is it better to define what a load will look like in the match protocol (and maybe that is what you are doing by saying "we expect guards"). I am unfamiliar with the process of PEPs, so I really don't know if this is the sort of question that should be brought up or not. I am learning a great deal with all this (both coding and process) so thank you all for your time and understanding.
@natelust I think you bring up good and valid points. An important part of this design process is to consider the ramifications of our decisions and as such I would not classify anything you brough up as "we dont' care about" at all. As I mentioned before: a good example or question can go a long way in helping to clarify what the result should be—even if that result ends up not including that specific feature and naturally cannot show the whole process that led to it.
You are quite right, though. There is no way we can ensure that the classes involved are truly 'static' and don't change meaning somewhere between invocations of the pattern matching. And I agree with you that we will probably find some code like this in one application or another. In this particular instance it seems to be like an awful lot of work just to avoid having guards :wink:.
In the end, it probably comes down to this: if you write 'clean' code, then the compiler/interpreter will probably be able to give you some good performance. If, on the other hand, you try to outsmart the compiler or language, there is little that can be done, and you are on your own. This is, of course, true of all (optimising) compilers and not a specific trade of Python. So, my argument in this regard is that we should design the pattern matching so that there is some (more or less) obvious form of 'clean' code, of the way it is supposed to work. Follow these guidelines and be rewarded...
The tension between load and store also lies in the very nature of pattern matching itself. We already have a simple form of pattern matching in Python through tuple unpacking: (x, y, z) = .... The core of what we are trying to achieve here is to expand the left hand side, i.e. the assignment targets to allow for more structure than just sequences. This can be as simple as (granted, the vector is basically still a sequence, but bear with me):
Vector(x, y, z) = ...
# or perhaps
Tree(Tree(a1, a2), Tree(b1, b2)) = my_node
The point here is this: the desire to have more structure than just sequences on the left hand side means that we need some way to express this structure. Classes are natural candidates for this, particularly since they already proved to be very versatile and efficient in expressing structure. But this leaves us with some kind of dilemma: how does the compiler know which parts are actual assignment targets, and which parts are classes meant to express the desired structure?
The static approach is to use some kind of syntactic clues. We might notice, for instance, that all classes denoting structure are following by an opening parenthesis. So, why not specify that any name followed by such an opening parenthesis shall be a class name giving structure, while everything else is meant to be an assignment target? Alternatively, we could add some more visual clutter and put the classes into angle brackets, say. This means we would write:
<Tree>(<Tree>(a1, a2), <Tree>(b1, b2)) = my_node
But that's an awful lot of noise for not really that much gain over the initial rule with classes are followed by a parenthesis.
The dynamic approach would be to go and check for each name occurring in the pattern, whether it is already defined, and if so, whether it is a class that we can use to de-construct the data. However, this leads to two problems. First, it quickly becomes highly inefficient to execute such code. Second, what if somewhere in the global context of a large module, you accidentally define a variable with name a1 or b2 or something? All of a sudden, the patterns using such variable names have an entirely new meaning. Hence, things become quite unpredictable.
Eventually, pattern matching has turned out to be particularly useful when the assignment can happen conditionally and we can give the subject to be matched some leeway concerning its structure. This led to the syntax that resembles very much a switch statement—although the notion behind it is quite different. And because switch statements are not about assignments, but about selection, they use load semantics for variables. The similarities and overlaps between the two concepts thus leads to some clashes and we need to find a way to have the best of both worlds. Spoiler: there is no perfect solution to this.
In the end, I think if we approach pattern matching from the angle of sequence unpacking and assignments, it should become managable even for students :).
@Tobias-Kohn I think you are right, that it will just be about finding the right way to teach it, and everything will work out for the best. I think your example of unpacking the Tree would be a great launching point in that regard. The teaching aspect is something I am quite interested in, and I may tack a crack at writing something introductory once the proposal stabilizes a bit. I find that I learn a bit more or get a different perspective with each message you write, so thanks!
As an aside, that does seems like a lot of work to avoid an if guard in the example I wrote, but I could see the appeal of this approach to people who have 7 different kinds of objects with 10 fields 6 of which are to be checked in a guard. This may be something like classes representing types of media in a library, or some kind of database table records etc. In that case having one reusable library and 6 lines or definitions would likely be appealing over long lines of guards, possibly with many line breaks, duplicated on each arm (and then imagine many match statements in a code base). It may then be common to see this approach in all but the most simple guards (or in cases where guards are not shared by multiple arms). However these are all hypothetical, and the question was more about letting the community decide the direction (which may just be stick with the pep) or the pep (which if I understand your response was the part about the optimizations you gain by using as proposed).
This is now our final open issue before we can start seriously revising the PEP (#113 -- it needs a lot more clarity in the motivation etc.).
Solutions that I would find acceptable (from most to least preferred):
- use the UPPERCASE rule plus the dot-in-the-middle rule, but drop the leading-dot rule (pragmatic)
- just the dot-in-the-middle rule (purist)
- all three (compromise)
- dot-in-the-middle and leading dot (what's currently in the PEP)
If three of the other four authors favored marking loads with some other syntax (either a prefix or suffix sigil or angular brackets or backticks), I might go along but I'd still hate it.
However what I'd most hate would be marking stores with a sigil or special brackets (backticks, <> etc.). This breaks the analogy between
a, b = value
and
match value:
case a, b:
print("Yes")
Note that the UPPERCASE rule would mean that we wouldn't have to make any use of sigils or other markup for loads in examples/expr.py even if we changed import tokenize to from tokenize import * (an removed the tokenize. from all those constants). This feels ideal to me.
For marking loads, I don't really have a preference.
I know that backticks were mentioned earlier, but I don't know if anyone specifically floated the idea of using backticks to indicate a load (as opposed to a store):
case Colors.BLACK:
# stuff
case `BLACK`:
# different stuff
Which makes BLACK kind of look like a literal if you squint.
This may be opening up a can of worms, but technically you could put any arbitrary expression within the backticks and the pattern machinery would treat it as a constant.
case `Colors["BLACK"]`:
# different stuff
Tobias wrote about this. Scala allows backticks for loads, but apparently they aren't popular in patterns. (It also has UPPER and dot-in-middle.)
I was hoping to reserve backticks for tagged template strings like in JS.
@viridia The backticks were on the table somewhere, as this is exactly what, e.g., Scala does. The potential additional benefit that we could get out of backticks is that we could also mark names in other places. As I mentioned before somewhere: Jython (which is still based on Python 2) has the problem that print is keyword but also a common method name in the Java libraries to be invoked. Hence, writing foo.'print' could potentially solve this issue. On the other hand, Guido meant that we should be careful with using the backticks too lightly for such a rather narrow use case, and I tend to quite agree with him.
The problem with Colors["BLACK"] is something that immediately comes up as soon as we have any kind of special marker, rather than a rule. Even if we used $, say, you could still argue that you might write $(Colors["BLACK"]).
@gvanrossum I would go either for the purist or the pragmatic rule. I like the leading dot rule a lot because it feels like a clever way to make a general rule applicable, but I also understand the general reservations about it. Python is mostly quite robust in that tiny changes to the code seldomly completely change the meaning of a program—I would regard this is one the core features for making Python such a good choice in education.
I am not entirely convinced that we need the uppercase rule, and I see two potential issues with it. First, there might be just as much resistance as in the case of the leading dot (sorry, I lost the thread on the dev-mailing list, so this might have been discussed there already). Second, it could be very tempting/inviting to just do something like:
def foo(x, Y):
match x:
case Y:
...
The more thought I have given it, the more I am convinced that we really should be careful with 'load and compare' semantics. Given all the issues and potential pitfalls that come with 'load and compare' semantics, I would certainly prefer the purist approach with only literals and dotted names (even the dotted names are already a compromise from the purist view :wink:). If it turns out that we need some load marker later on, we can still introduce it.
Good points.
There's one other idea I want to throw out there - again, I am just brainstorming here, don't expect me to put up a serious defense of any of these ideas - is to say, well, since the 'load and compare' is using equality to compare the name, then use the equality operator:
match x:
case ==BLACK:
Yeah it looks ugly. But it avoids the problem with leading dot that people raised, which is that dot is such a small glyph that people might not notice it. This certainly doesn't suffer from that problem. And, maybe ugliness is a virtue - in the sense that we might want to gently discourage people from using unqualified names.
The idea with the equality operator becomes quite weird as soon as we put it into the context of attributes/arguments.
match x:
case Circle(color===BLACK):
...
Maybe, we would have to make sure there is a space separating it properly, i.e. writing color= ==BLACK. But anyway, I don't think it would really work.
Besides that, if we are considering switch-statement semantics, you could write:
match x:
case ==BLACK:
...
case ==RED:
...
instead of
if x == BLACK:
...
elif x == RED:
...
Hence, I think this is a perfect example to show that we do not need to cover the switch statement, because if-elif-else chains already cover that nicely :).
I hope it won't be held against me when I throw (yet another) new idea into the ring at this stage. We all agree that the major issue of the pep is that store and load are at odds with each other, yet there are compelling arguments for having both of them. Seeing the latest post from @viridia led me to think about this whole affair from the perspective of operators vs. keywords again. Could we not solve the issue of differentiating store from load with a keyword instead of an operator-like symbol (e.g. x?, x!, $x, &x) or an adornment ( `x`, <x>))? To be honest none of them feel quite pythonic (some of them even look a bit like Perl if you squint 🤢). So how could this look like with keywords instead:
HENRYS_FAV_COLOR = 'black'
match random_new_car():
case Car(color=HENRYS_FAV_COLOR): # load syntax
print('Henry Ford approves')
as Car(color=color): # store syntax
print(f'Ugh, not another {color} car')
Pros:
- the same clutter free syntax can be used for both load and store
- both FP-zealots and switch-afficionados get their shiny new toy and both use cases are clearly separated (yet can still be composed; see the next bullet point). This basically doubles the number of people who will want this pep to get accepted instead of fighting in the trenches for their personal favorite.
- the syntax maps well to problems where there are a few special cases one likes to compare against and one (or more general cases) that have to be matched against a pattern. If there we no load syntax altogether everything would either need to be spelled as a guard or using a combination of
matchandif ... elif ...else. - we could allow arbitrary expressions for load semantics e.g.
case Car(color=people['Henry'].favourite_color())because there is no weird symbol or adornment getting in the way. - avoids introducing special meaning to things like capitalization e.g. treating
FOOorFoowith load semantics compared withfoo
Neutral (either Pro or Con, depending on your take on the issue)
- load and store semantics is defined for the whole arm/branch and not on the level of individual variables. In my personal opinion this make the statement much clearer to read (and explain to beginners). On the other hand, use cases like the one suggested by @natelust, where one wants to use both store and load in one arm/branch are not possible. In those cases one would have to use store semantics with an appropriate guard
match obj: case Point(x, y, z, reference_frame=.reference_frame): return (x**2+y**2+z**2)**0.5
Cons:
- yet another keyword (or an existing keyword used in a novel way)
- there is the question of what keywords to use. Looking at the example above where I used
casefor load andasfor store I feel that it would be aesthetically pleasing if both keywords had the same number of letters so that the code after the keywords would align vertically; maybecaseandwithwould fit that bill?. OTOH both keywords should also be the same type of word; maybecasefor load andpatternfor store?
Edit:
A syntax variant with keywords of same length
match random_new_car():
case Car(color=HENRYS_FAV_COLOR):
print('Henry Ford approves')
with Car(color=color):
print(f'Ugh, not another {color} car')
If I try to summarise @SaschaSchlemmer's suggestion, there are three levels at which the switch/load and the match/store semantics could potentially coexist:
- Coarse grained: by having both a
switchand amatchstatement; - Sascha's middle path: by having two different types of
caseclauses; - Fine grained: by controlling the semantics of individual tokens.
The new idea is this middle path, where differentiate between load and store semantics on a clause level. @SaschaSchlemmer: please correct me if I am wrong.
Of these possibilities, the first one is clearly beyond the scope of this PEP since a switch statement could be introduced completely independent of our work here (whether a switch statement makes sense is a different story, of course). The second one now marries the two concepts just enough to consolidate them under a common umbrella: the top match statement.
I am highly sceptical as to whether this approach could work, as I see a couple of issues here. In short: I think it ends up being an overly complex solution that does not fully solve the problem. We'd pay a high price for little gain.
-
The overall direction of this PEP is to chip away anything we do not really need and concentrate on the absolute core set of features needed to have some basic, yet usable and versatile pattern matching. In contrast, the idea with two different case clause types follows a "let's have it all" approach—an approach with a historically very poor success rate I am afraid, and that rather goes against the Zen of Python.
-
As the suggestion also points out: it does not really solve the issue brought up by Nate. In fact, I wonder if there are any real use cases where I want to mix "switch" case clauses with "match" case clauses. As it clearly does not solve the fine grained use cases, and if there are no compelling examples why
caseandas/withshould be mixed within the same statement, this basically falls back to the first, coarse grained, case. Hence, we better clearly separate the two completely (theoretically, a "switch-PEP" could still propose to reusematchas the top-keyword).
@Tobias-Kohn
You summarized my train of though pretty well (although I didn't explicitly think about the 1. level since this use case is already covered pretty well by if ... elif ...else). It seemed to me that cramming all the features (store and load) together (3. level) makes match quite powerful but maybe just too hard to explain (and thus sell the pep).
-
for the example @natelust brought forward (mixing load and store semantics in one arm/branch), I really think using a guard is the one obvious way to do it and argue that we should explicitly reject special syntax for this use case.
-
for the simpler use case of load and compare I am not so sure; people in this thread and elsewhere appear to like/expect the feature. Let me quote:
@gvanrossum
The main use case is definitely FP style structure unpacking. An early proposal didn’t even have constant value patterns. But named constants and enums are very much part of Python’s culture and we felt we had to support them.
Allowing
case Car(color=HENRYS_FAV_COLOR)will replace a lot ofif isinstance(some_car, Car) and car.color == HENRYS_FAV_COLOR. Notice here that this use case is not equivalent to a c-style switch statement (that should just be written usingif FOO ... elif BAR ... else ...but also features the same structure unpacking. There is also precedent in other languages that have similar features (but albeit with some not so nice syntactical choices). Marrying both concepts at the 2. level felt like a good solution for the common cases to me. -
having two keywords attached to
matchalso is a two-way door in that we could introducematchwith store semantics now and have (extended) load semantics with the second keyword later in another pep. This would also strip the current pep of the exception made for dotted names etc. and reducematchwith the store-clause to binding variables and comparison with literal values.
@SaschaSchlemmer It seems that we are generally quite in agreement with only minor points where we differ. And I would certainly be interested if you know of a compelling example that shows that the two case clauses really make sense.
having two keywords attached to match also is a two-way door [...]
While I totally agree with you that this could (and probably should) be addressed in another PEP, this really is a one-way door. Once we introduce two different case clauses, there is no going back. We should therefore not do something like this lightly.
@Tobias-Kohn I meant that the two-way door is that we could introduce match and a case clause with store semantics now and decide later whether we'd like load semantics at the token level or via another clause. No need for deciding for/against the second case clause now (except maybe for some foresight in naming the case clause proposed in this pep)
I find it unacceptable that load vs. store applies to the whole clause. Also when I first read Sascha’s first example I didn’t understand it because I didn’t notice the ‘as’.
Maybe we could debate the UPPERCASE rule, and decide first two preferences. In Scala the UPPER rule seems to work well. Does Rust have it?
@Tobias-Kohn to be honest I have not seen a convincing example where I'd prefer load semantics (except for literal values) over using store semantics and an appropriate guard.
I still dislike the uppercase rule, because the language is now enforcing a convention (and one that may not always be "correct" in this context). It also only enforces it in this very narrow use case. Both of these points make it feel more "bolted-on" than the other options.
There are also cases that aren't obvious to me:
- What about names that start with underscores? I often have private module-level names like
_PATHor_START_DATE. Do we just skip over all underscores first? I believe Scala treats them as lowercase. - More generally, what about names using characters with no concept of upper- or lower-case?
It's also worth considering how easy it is to correct unintentional stores when they're found. I've recently added a syntax warning for some trivial cases (prompted by a recent mailing list discussion, and not pushed yet):
>>> match 42:
... case foo: pass
... case bar: pass
... case _: pass
...
<stdin>:2: SyntaxWarning: unguarded name capture pattern makes remaining cases unreachable; did you forget a leading dot?
This simple action of adding a . in one place becomes more complicated for some of the alternatives:
- pragmatic: "... consider renaming to UPPERCASE?"
- purist: "... consider refactoring to use a qualified (dotted) name?"
I'll need to think about this more. Right now I pretty strongly prefer "PEP" and "purist", and pretty strongly dislike "pragmatic" and "compromise" ("pragmatic" and "purist" are pretty loaded names when discussing Python language design, by the way... :wink:).
Strong strong strong dislike of `load`, though, because they look like strings and are a total pain to discuss/document in markup environments (this sentence alone has 11 ` characters in it).
Either way, I think it's important to constantly emphasize (especially when discussing name patterns) that we are creating this feature specifically for destructuring, not switching. That should help reduce pushback from people who want to adorn stores rather than loads (or feel that rules like "purist" aren't powerful enough).
@gvanrossum I don't believe it does. To my knowledge in rust you either must use a match guard, or their binding operator in cases like these (I am not a rust expert though). The binding operator does a loads, compares, then stores, an example can be found here (it shows matching a pattern, but it may be a variable as well in a limited sense). In python (using their same at symbol) that might be spelled
number_of_doors=4
match random_new_car()
case Car(color=color, doors=doors@number_of_doors):
print(f"This is a {color} car guaranteed to have {doors} doors")
where I the variable is stored in doors, and I guess could be left as _ in the case you only wanted a constraint.
Edit: Fixed a typo. An I want to highlight that I think syntax like this has been discussed and was not favored, I only wanted to compare to what is in rust.
Big +1 from me for @brandtbucher pointing out the difficulties with the uppercase rule! I hadn't though of that, but I think these two issues (leading underscores and non-latin names) are quite valid. Of course, a firm rule will answer these questions, but it shows nonetheless that it might not be quite as straight-forward as at least I had thought.
I also like the SyntaxWarning! Very nice indeed!
While I am certainly not too eager to go for the backticks rule, I am not entirely sure whether the use of the language in markdown can be a strong concern. After all, it would be intended as a rather 'obscure' feature to be used sparingly.
I also like the SyntaxWarning! Very nice indeed!
I knew you would like that.
I am not entirely sure whether the use of the language in markdown can be a strong concern.
Alright, you're in charge of writing the RST docs if we go this route. :wink:
Just to double check, is there anyone here that is still against default bind(store) semantics and prefers evaluate(load) ? I know I mentioned some misgivings at some point but I'm generally onboard with binding by default (I'm asking because of brandt's comment about «help reduce pushback from people who want to adorn stores rather than loads »)
The uppercase rule build on the convention and idea that constants are written in uppercase letters. In Scala (where the uppercase rule is applied), there is also the Java convention of writing all classes with an uppercase letter. This means that, e.g., the load semantics of Point in Point(x, y) is already established by the name Point itself.
In Python, we face several difficulties with this rule:
- Many classes and types are written entirely in lowercase. The load semantics of class names must therefore be established by the following parenthesis rather than the name itself.
- There is no such thing as a constant in the strict sense, nor is there anything that would prevent local variables and parameters to use uppercase letters. Since I would rather not have too much 'load and compare' semantics in the patterns, I see the doors wide open for misuse (but there might be differing opinions on that).
- As Brandt has just pointed out: there are many possible names that are neither clearly uppercase, nor clearly not.
- There is quite some resistance from people who are uncomfortable with the introduction of such a rule that has no precedent in Python so far.
In favour of the uppercase rule, we find that it is quite simple, and solves the load/store problem without additional syntactic clutter. It thus has the potential to be a viable compromise between the two groups. On the other hand, having load and compare semantics for dotted names seems to cover enough cases as far as I am concerned.
We seem to have agreement that dot-in-middle (a.b) is in and leading-dot (.b) is out. Also that stores don't need sigils, and that we'd rather not use sigils or other markers for loads.
Which leaves the choice: Do we use some form of the UPPERCASE rule or not? Let's have a vote among the authors.
If it's accepted, I'd solve one minor issue by ruling that _Foo and __Foo are UPPERCASE. I don't know what to do for alphabets without lower/upper distinction but I don't see that as a show-stopper. (@thautwarm, can you help here?)
I vote no uppercase.
I still like the leading dot, actually... though I recognize that adding back it later is painless.
I also vote no uppercase.
However: as I understand the unicode standard seems to have an "UPPERCASE" flag for each character that specifies whether it is uppercase or not. Ignoring leading underscores also seems reasonable enough. But since my reservations are primarily on other aspects than whether we can determine if something is uppercase or not, I am still in favour of not implementing this rule.
Note that even with the purist approach, there are ways to match an unqualified names, using either guards or custom matchers. I recognize that using it this way is quite ugly and verbose - it might make sense if you had a match statement with a bunch of qualified names, and needed one special case for an unqualified name - but if you had a bunch of unqualified names the burden would be great enough as to force most people not to use a match statement at all.
@viridia Could you vote? If you're against UPPER it's decided. If you're for we'll have to ask Ivan.
Regarding uppercase for non-Latin alphabets, Unicode has several letter categories, and "Lo" (Letter, other) is neither lowercase nor uppercase -- and there are 127,004 of those. I'm not sure but it looks like that includes (almost) all CJK "letters".
If we're still interested after the vote I can ask around.
I am -0.5 on using uppercase.
One other variant I would propose is, rather than "Starts with a capital letter", instead "Contains no lowercase letters". I am not sure if that is better or whether it addresses the unicode issues.
Alternatively, the rule could be "Conforms to the Python code formatting standard for an enumeration constant", in which cause I would be +0.
One other variant I would propose is, rather than "Starts with a capital letter", instead "Contains no lowercase letters". I am not sure if that is better or whether it addresses the unicode issues.
That won't work; that would mean that variable names with no latin letters (which may be all variables for code written in some non-English languages) would default to load semantics rather than store.
Agreed. There are many available disambiguating characters available. I'm personally in favor of ^ as a prefix, which would cause no ambiguity.
On Thu, Jul 2, 2020, 7:05 PM Daniel F Moisset [email protected] wrote:
One other variant I would propose is, rather than "Starts with a capital letter", instead "Contains no lowercase letters". I am not sure if that is better or whether it addresses the unicode issues.
That won't work; that would mean that variable names with no latin letters (which may be all variables for code written in some non-English languages) would default to load semantics rather than store.
We could keep the door ajar for some variant of the uppercase rule by stipulating that capture variable names shouldn't start with a capital letter (after stripping leading underscores).
This shouldn't affect users whose alphabets has no case distinction (though a future decision to use a leading uppercase letter to mark dot-free loads would). It also shouldn't affect any serious use of match/case -- PEP 8 is quite clear that locals should use all-lowercase, and while I've seen plenty of code that violates the recommendation of using UPPERCASE for named constants or CapWords for class names, I don't think I've seen much code using anything but lowercase for local variables except on a whim. I doubt that anyone would even notice if we snuck this into the implementation without telling anybody. :-)
Glad to see the authors voting for no uppercase.
Many including me believe using uppercases this way is bad for Python.
If it's accepted, I'd solve one minor issue by ruling that _Foo and __Foo are UPPERCASE. I don't know what to do for alphabets without lower/upper distinction but I don't see that as a show-stopper. (@thautwarm, can you help here?)
Of course.
It seems that "no uppercase" will be adopted, and maybe no need to consider alphabets without lowercase/uppercase concepts.
In case you still some information about CJK languages, usually a CJK language user will expect all characters to be either uppercase or lowercase(because this is a routine way). However, instead of using cases, CJK language users might prefer using 「名字」 as the load of variable 名字.
Looks like we're going with just the dot-in-the-middle rule ("purist"). People can create a dummy class and put values in there:
def foo(pt, context):
class c:
ctx = context
match pt:
case Point(0, 0, context=c.ctx): ...
I imagine most people will not use the dummy class and instead go for:
def foo(pt, context):
match pt:
case Point(0, 0, context=c) if c==context: ...
And it's what I've seen in Haskell and Rust (see for example Listing 18-27 here)
This has been implemented. Still needs PEP though.
I imagine most people will not use the dummy class and instead go for
def foo(pt, context): match pt: case Point(0, 0, context=c) if c==context: ...
I'm not so sure. That idiom looks backwards: We want to compare a value, but instead we extract it and then add a guard -- but for the human reader, a guard is much more expensive to understand, because there are many other things you could test for in a guard. Plus now the reader is wondering, is c only used in the guard, or also in the block (the ... in your example), or in the code past the end of the whole match statement.
Also it would be repetitive if several case clauses need it.
With the adoption of the dot-in-the-middle rule, it seems that the use of "." as a sigil, if one is ever needed in the future, could have a little extra weight to it by analogy. Surprisingly I haven't seen* it explicitly stated or discussed anywhere, although it seems likely it was the intention, that the two combined rules would have compared nicely with the traversal rules of relative imports (and the current working directory concept on the Linux file system). Essentially, in this context, if I understand correctly, as foo.bar is matching the constant bar loaded from the foo namespace, .bar could be said to be matching the constant loaded from the current one. Realizing this while reflecting on the PEP updates fulfilled a missing motivator that took the original PEP on constant patterns from arbitrary to making more sense to me personally, so I don't know if others readers might have missed this connection as well, not that other reasons for not maintaining it in the PEP don't exist, of course.
Apologies if this is obvious and/or unwanted noise.
*searching this repo, python-dev, python-ideas, the PEP versions