rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

`is` operator for pattern-matching and binding

Open joshtriplett opened this issue 5 months ago • 163 comments

Introduce an is operator in Rust 2024, to test if an expression matches a pattern and bind the variables in the pattern. This is in addition to let-chaining; this RFC proposes that we allow both let-chaining and the is operator.

Previous discussions around let-chains have treated the is operator as an alternative on the basis that they serve similar functions, rather than proposing that they can and should coexist. This RFC proposes that we allow let-chaining and add the is operator.

The is operator allows developers to chain multiple matching-and-binding operations and simplify what would otherwise require complex nested conditionals. The is operator allows writing and reading a pattern match from left-to-right, which reads more naturally in many circumstances. For instance, consider an expression like x is Some(y) && y > 5; that boolean expression reads more naturally from left-to-right than let Some(y) = x && y > 5.

This is even more true at the end of a longer expression chain, such as x.method()?.another_method().await? is Some(y). Rust method chaining and ? and .await all encourage writing code that reads in operation order from left to right, and is fits naturally at the end of such a sequence.

Having an is operator would also help to reduce the demand for methods on types such as Option and Result (e.g. Option::is_some_and and Result::is_ok_and and Result::is_err_and), by allowing prospective users of those methods to write a natural-looking condition using is instead.

Rendered

joshtriplett avatar Feb 16 '24 11:02 joshtriplett

Nominating because this is making a proposal for the 2024 edition.

joshtriplett avatar Feb 16 '24 11:02 joshtriplett

I see there is no mention of pattern types though it seems they would be similar but distinct use of is as an operator?

is this a pre-requisite of pattern types (to get the keyword in the language?) or does it conflict with the types usage?

fbstj avatar Feb 16 '24 11:02 fbstj

when combined with pattern types, what way does the precedence go? so, does v as i32 is 5 parse as (v as i32) is 5 or v as (i32 is 5)? or is it ambiguous and errors, requiring parenthesis?

programmerjake avatar Feb 16 '24 12:02 programmerjake

@fbstj wrote:

I see there is no mention of pattern types though it seems they would be similar but distinct use of is as an operator?

is this a pre-requisite of pattern types (to get the keyword in the language?) or does it conflict with the types usage?

This is not related to pattern types. I believe we can do both without conflict. I added some text to the "unresolved questions" section to confirm that we can do both without conflicts.

@programmerjake wrote:

when combined with pattern types, what way does the precedence go? so, does v as i32 is 5 parse as (v as i32) is 5 or v as (i32 is 5)? or is it ambiguous and errors, requiring parenthesis?

I've added some text to the RFC, stating that this should require parentheses (assuming pattern types work with as).

joshtriplett avatar Feb 16 '24 12:02 joshtriplett

What patterns does is enable that aren't covererd by matches!?

dev-ardi avatar Feb 16 '24 13:02 dev-ardi

@dev-ardi One example:

if expr is Some(x) && x > 3 {
    println!("value is {x}");
}

joshtriplett avatar Feb 16 '24 13:02 joshtriplett

I find it a bit odd that we would want both is expressions and let chains. They serve exactly the same purpose, the only difference being their reading order. I can understand the argument that we would want to have let chains due to people expecting them to work given we already have if let and the like but this feels like the wrong way to address that. I would instead expect us to deprecate if let and while let in favor of is and dropping let chains.

I feel like that should be added to the alternatives and/or pad out the feature duplication drawbacks paragraph.

Veykril avatar Feb 16 '24 13:02 Veykril

@Veykril wrote:

I would instead expect us to deprecate if let and while let in favor of is

That would be a massive amount of churn for very little benefit.

Nonetheless, you're right that I should add this to the alternatives section.

joshtriplett avatar Feb 16 '24 13:02 joshtriplett

Adding multiple ways to do the same thing also makes teaching Rust harder: let in Rust is everywhere: if let, while let, let-chains, let ... else, ... So you have to teach pattern matching with let anyway. Meaning, this "right-to-left" reading order will become natural to Rust users quick. By introducing a different way, while easy and intuitive to understand, won't help much in code clarity IMO, as people are already used to reading let patterns.

flip1995 avatar Feb 16 '24 14:02 flip1995

I'd epxect is to be a pretty common variable name, so maybe worth exploring less common words, like Some(y) binds x && y > 5 or x matches Some(y) && y > 5.

I do think larger expression make the left vs right swap interesting, but remember perl created chaos with its left vs right trickery, so one should really be careful here. matches maybe works both ways.

Yes both let Some(y) = x && y > 5 and let .. else become extremely confusing, but humans could parse some sensibly bracketed flavors, like { let super Some(x) = foo } && y > 5 ala https://blog.m-ou.se/super-let/

burdges avatar Feb 16 '24 16:02 burdges

If we add is as a keyword, we should also reserve isnot as a keyword for future NOT-patterns

if expr isnot Some(x) {
    println!("error");
}

Edited: I'm sorry for some impoliteness with "must"

VitWW avatar Feb 16 '24 16:02 VitWW

Author mentioned just one alternative name for is: ~. But I think we should add another alternative names in RFC, like equal or identic:

if expr identic Some(x) && x > 3 {
    println!("value = {x}");
}

VitWW avatar Feb 16 '24 17:02 VitWW

@VitWW I don't think so, Vit.

workingjubilee avatar Feb 16 '24 19:02 workingjubilee

@VitWW

If we add is as a keyword, we must also reserve isnot as a keyword for future NOT-patterns

You speak in a commanding way ("we must"), without justification. So, having considered my own thoughts: ...I disagree! Please offer a justification for your reasoning, and especially, why it should be addressed now, not "we do this for future expansion opportunities". It seems it will simply run up against all the concerns we're already facing, and we can wait until then.

Author mentioned just one alternative name for is: ~. But I think we should add another alternative names in RFC, like equal or identic:

To say we should do something is better than to command, but I don't think you have explained the prior art or other reasoning why it must be addressed in the RFC. Perhaps you were building off the point that burdges made? But unfortunately equal is also a common function name in Rust, used in e.g. polars (as public API) and the stdlib (as private), and also seems to be a reasonably popular variable name. So it at least doesn't feel obvious as to why we would go with that.

For everyone else suggesting alternative keywords, I do really recommend everyone at least check using grep.app or something similar if their recommendation is in Rust public API somewhere, and how many cases, and be forthcoming on how many examples they find. You will likely pull hundreds of pages, so you may wish to do extrapolation or more exact queries using other tools after downloading the crates.io index.

Of course, we do have our system of keyword reservation, the k# and r# stropping, and edition-sensitive keyword parsing, so I think this is not the only thing to consider, and we can in fact simply pick the nicest-looking syntax if it doesn't seem an overwhelming problem. But it is best if we keep in mind any induced complexity in the lexer and parser, and the community reaction, while we rummage through our collection of Pantone chips for this shed.

workingjubilee avatar Feb 16 '24 19:02 workingjubilee

@joshtriplett While I think is, er, is a fine choice, I wish to (gently) refute ~ as lacking a history as a "pattern-matching operator", and provide some background that might be worth at least reviewing. First, SQL does have bare ~ but I think it is reasonable to mostly omit considering SQL's language features, as it is deliberately unlike most other PLs for reasons beyond this discussion. However, ~= and =~ do have prominent histories as a pattern-matching operator!

  • Swift does use ~= as a pattern-matching operator, and even uses it as part of case evaluation: https://developer.apple.com/documentation/swift/range/~=(::)
  • Ruby offers the inverse, =~, for a regex-centric pattern-matching: https://ruby-doc.org/core-2.6.3/Regexp.html#class-Regexp-label-3D~+and+Regexp-23match
  • And part of why it does so is because Bash does it: https://www.gnu.org/software/bash/manual/html_node/Conditional-Constructs.html#index-_005b_005b

Obviously, the regexp-centric examples don't exactly match to the Rust pattern language, but it's clearly a popular choice if three exceptionally common procedural PLs use it. Other examples like Vimscript and PromQL also use them, but obviously that gets increasingly niche. Wiktionary even asserts ~= is used in mathematics... but also mentions~= is also used as an equivalent to Rust's !=, e.g. Lua and MATLAB.

It seems to me when ~ is included in an operator's symbol, either it means that negation, or it does imply something akin to saying "roughly like...", an approximate match, which may be why Dart uses ~/ for divide-to-integer (as opposed to dividing to a double, which more accurately represents the result of 3 / 2). Of course, that very page I just cited also mentions Dart has is, so I only consider this to be interesting context!

workingjubilee avatar Feb 16 '24 20:02 workingjubilee

Some reactions I had while walking and thinking on this earlier:

  • I like is for legibility and I think it will probably read nicer than let chains in almost all cases
  • Python has is operator as object identity which is almost only used for x is None, which the operator here would support. A possible addition to the prior art.
  • I strongly agree with the concern that's been repeated a few times here that we already have forms like if let and also let - else, and the distinction here is currently proposed to just be a style choice.

Especially as the recent language survey seemed to highlight language bloat as one of the largest risks to the language, having this purely be stylistic seems to be in direct opposition to the data.

If we were to move forward with this I'd hope that this RFC takes a stronger stance on when to use let forms and when to use is forms, and strongly considers the deprecation alternative.

davidhewitt avatar Feb 16 '24 23:02 davidhewitt

Possible observation: by allowing expr is PAT && condition here, users may be more likely to try PAT && condition as match arms instead of the current PAT if condition. We may want to allow that:

match color {
    (RGB(r, g, b) | RGBA(r, g, b, _)) && r == b && g < 1 => /* ... */,
                                      ^^ - this is currently a compile error, should be `if`
    _ => /* ... */
}

... I think it'd ease refactoring and papercuts when converting code between x is PAT && y { ... } to match x { PAT && y => ... }

davidhewitt avatar Feb 16 '24 23:02 davidhewitt

While I think is, er, is a fine choice, I wish to (gently) refute ~ as lacking a history as a "pattern-matching operator", and provide some background that might be worth at least reviewing. First, SQL does have bare ~ but I think it is reasonable to mostly omit considering SQL's language features, as it is deliberately unlike most other PLs for reasons beyond this discussion. However, ~= and =~ do have prominent histories as a pattern-matching operator!

* Swift does use `~=` as a pattern-matching operator, and even uses it as part of `case` evaluation: https://developer.apple.com/documentation/swift/range/~=(_:_:)

* Ruby offers the inverse, `=~`, for a regex-centric pattern-matching: https://ruby-doc.org/core-2.6.3/Regexp.html#class-Regexp-label-3D~+and+Regexp-23match

* And part of why it does so is because Bash does it: https://www.gnu.org/software/bash/manual/html_node/Conditional-Constructs.html#index-_005b_005b

Obviously, the regexp-centric examples don't exactly match to the Rust pattern language, but it's clearly a popular choice if three exceptionally common procedural PLs use it. Other examples like Vimscript and PromQL also use them, but obviously that gets increasingly niche. Wiktionary even asserts ~= is used in mathematics... but also mentions~= is also used as an equivalent to Rust's !=, e.g. Lua and MATLAB.

It seems to me when ~ is included in an operator's symbol, either it means that negation, or it does imply something akin to saying "roughly like...", an approximate match, which may be why Dart uses ~/ for divide-to-integer (as opposed to dividing to a double, which more accurately represents the result of 3 / 2). Of course, that very page I just cited also mentions Dart has is, so I only consider this to be interesting context!

Just to follow up on this a bit, particularly from a mathematical perspective. Yes, you're right that ~ has some similarity to ≈, which means "approximately equal to," and thus it makes sense as a pattern-matching operator.

However, ~= and =~, from a programming perspective, are far too loaded to really work well as that kind of operator. Like, I've been writing a lot of Lua lately and ~= is just straight-up != in Lua.

Plus, with the way Rust tends to organise its operators, the existence of ~= implies that there should be a standalone ~, which wouldn't be the case here. So, I would advocate against that regardless.


Drawing to the bigger point of what this operator should be: I genuinely don't think that there's something better than is. It's two characters, which is as long as many existing operators. People say that it's a common variable name, but I think that it's only common as a pluralisation of i, where i_s could easily serve that purpose. And the only other reasonable alternative that I can think of is ~, which is shorter and less clear. Any other keywords are going to be longer, more awkward, and more likely to cause name conflicts.

I point out some of the alternatives in the RFC because I think that we should definitely include the best arguments in favour of is in the RFC, but I genuinely do think that it's the best choice.

clarfonthey avatar Feb 16 '24 23:02 clarfonthey

If we add is as a keyword, we must also reserve isnot as a keyword for future NOT-patterns

I disagree, IMO not patterns can be written as !Some(_) (!-patterns can be used everywhere a fallible pattern is (match, if let, is, let ... else), so isn't an is! operator). This means there's two ways to write it, with the not operator: !(a is Some(v)) || v == 0 or with a not pattern: a is !Some(v) || v == 0 or a is (!Some(_) | Some(0))

programmerjake avatar Feb 17 '24 00:02 programmerjake

You speak in a commanding way ("we must"), without justification. .... Please offer a justification for your reasoning, and especially, why it should be addressed now

@workingjubilee I'm sorry for some impoliteness with "must". Not-patterns wasn't added also because in today rust syntax it is ugly to write them: NOT(Some(x)) = expr and it becomes almost pretty with isnot keyword. Now it should be reserved a a keyword, because it is dual to is , just like >=/<=; ==/!= and it is strange to add just one from dual pair.

But unfortunately equal is also a common function name in Rust

Uups

VitWW avatar Feb 17 '24 01:02 VitWW

People say that it's a common variable name, but I think that it's only common as a pluralisation of i, where i_s could easily serve that purpose.

I won’t claim it’s common, but it’s probably worth noting that is is the country code for Iceland, and so is a natural variable name for strings containing Icelandic-language text.

e2-71828 avatar Feb 17 '24 06:02 e2-71828

I prototyped this feature back in 2018 and converted rustfmt to this style, but later dropped the corresponding rustfmt branch, accidentally and unfortunately. But the experience report is preserved at least - https://github.com/rust-lang/rfcs/pull/2260#issuecomment-367158854.

I still think this is the right thing to do, and something that should have been added instead of if-let chains from the start. It would be unfortunate if the scenario I predicted in https://github.com/rust-lang/rfcs/pull/2497#issuecomment-404860099 plays out and EXPR is PAT is not added for social reasons because if-let chains already exist.

petrochenkov avatar Feb 17 '24 08:02 petrochenkov

@petrochenkov Agreed. I think let chains have value because if-let already exists and people expect let chains to work, but I don't think that should prevent us from adding is. That would feel like a suboptimal path caused by path dependence.

joshtriplett avatar Feb 17 '24 08:02 joshtriplett

Considering the multiple bugs around temporaries that was found with let chains, perhaps we should just reserve the is keyword in edition 2024 and give the implementation more time to mature?

ChayimFriedman2 avatar Feb 17 '24 19:02 ChayimFriedman2

Just because I haven't seen anyone comment on it yet, I would like to know if my intuition that is should have higher precedence than == (but still recommend parentheses, similar to mixing && and ||) matches others' intuition as well. I could just be an outlier here and would love if others pitched in how they feel as well.

Particularly this thread: https://github.com/rust-lang/rfcs/pull/3573#discussion_r1492740859

Feel free to just thumbs up/thumbs down to express support if you don't have much else to add.

clarfonthey avatar Feb 18 '24 16:02 clarfonthey

@clarfonthey wrote:

Just because I haven't seen anyone comment on it yet, I would like to know if my intuition that is should have higher precedence than == (but still recommend parentheses, similar to mixing && and ||) matches others' intuition as well. I could just be an outlier here and would love if others pitched in how they feel as well.

My intuition tells me "there is no possible circumstance in which I would ever want to see these combined without parentheses", which makes me feel that it's irrelevant what their relative precedence is.

(I think that's true for a few other cases in the existing precedence table as well.)

joshtriplett avatar Feb 18 '24 21:02 joshtriplett

That is disappointing to hear. People tend to eschew parentheses where they are unnecessary because the language already has many cases where some kind of parenthetical or brace or bracket is already either mandated by the syntactic form or is mandated by expressing the desired result, and it does not actually make the code significantly less clear to imitate Lisp slightly less.

workingjubilee avatar Feb 20 '24 06:02 workingjubilee

why would any need either boolean == (x is Some(z)) or (value == y) is true so frequently that one or two pairs of parenthesis are going to bother them :confused:

kennytm avatar Feb 20 '24 08:02 kennytm

People tend to eschew parentheses where they are unnecessary

That's my preference as well, for cases that are widely parsed correctly by people who don't have the precedence table memorized. But for instance, the lint against using && and || together without parentheses is a good example where we suggest that they are more necessary than the precedence table would otherwise indicate. I think there are some cases that are intuitively obvious to people, and others where if you don't have the precedence table memorized you're likely to find them confusing. And I've regularly seen confusion about (for instance) the parsing of as.

I do personally think mixing == and is without parentheses seems more likely to lead to confusion than clarity. If many people feel strongly in the other direction, I could imagine changing that from "parentheses are always required" to "warning lint for not using parentheses", like && and ||. In any case, I will include it in the alternatives section.

joshtriplett avatar Feb 20 '24 08:02 joshtriplett

On parens:

The safe thing to do is start out always requiring them, since then we could look at how the code comes out with them, and remove the requirement as a non-breaking change later once we have evidence.

scottmcm avatar Feb 20 '24 08:02 scottmcm