wren
wren copied to clipboard
[RFC] Add smartmatch (flexible "satisfies"/"belongs-to") operator
(Background: I have been thinking about smartmatch a lot recently. e.g., https://github.com/wren-lang/wren/issues/956#issuecomment-817233866 and https://github.com/wren-lang/wren/issues/968#issuecomment-819148504 . I realized I should actually propose it for independent discussion! This builds on my https://github.com/wren-lang/wren/issues/968#issuecomment-819954076 in the discussion of x in y as an operator. Thanks to everyone participating in #956 and #968 for thought-provoking discussion! Thanks also to all the folks who have worked on smartmatch in Raku over the years.)
Many programs at some point ask "does X have property Y?" or "is X part of collection Y?". For example:
- Regular-expression matches: "does this string match this regex?"
- Substring matches: "does this needle occur in this haystack?"
- List-membership testing (#968): "is this element part of this list?"
- Range testing: "is this number in this range?"
- Typechecking: "is this value of this type?"
- Pattern-matching: "does this value match a value in this list?"
I propose taking a page from the Raku programming language. Raku unifies these tests under a binary operator called "smartmatch".
Overview
-
Smartmatch is a binary operator, spelled
~~in Raku.val ~~ thingchecks whethervalmatchesthing. What "match" means depends onthing. For example:1 ~~ 1.0sincething1.0considers numerically equal values to match1 ~~ 1..10sincething1..10considers all numbers in the range to match.
-
~~has the same precedence and associativity as==. -
Every object has a special method that says whether a value matches that object. In Wren, I would use
~~(_), sofoo ~~ barwould be exactlybar.~~(foo). Note: this may require a new opcode --- https://github.com/wren-lang/wren/issues/968#issuecomment-819762998~~(_)does not have to return a boolean since all Wren values can be tested for truthiness.- Suggested implementations for the built-in classes are listed below
Advantages
-
Handles multiple use cases without having to change the syntax.
- E.g., in a bounding-box class,
~~could check for intersection. - If regexes are added (#933),
str ~~ regexcould check for regex match as a shorthand forregex.match(str).
- E.g., in a bounding-box class,
-
Permits users to customize behaviour for their specific programs from pure Wren code
-
Can test for list/sequence membership (#968) without risking confusion with
for- Re. @jube's https://github.com/wren-lang/wren/issues/968#issuecomment-820214452:
x ~~ [1,2,5]is visibly different fromfor(x in [1, 2, 5])
- Re. @jube's https://github.com/wren-lang/wren/issues/968#issuecomment-820214452:
Advantages when used with a switch statement
Smartmatch provides a very clean way to express switch cases (#956). Each case can be the right-hand side of a smartmatch. That way you can have any case expression you want without having to special-case syntax to support complex conditionals.
For example, in switch(val):
- Simple conditional
case 3: ...would testval ~~ 3, which I suggest be implemented asval == 3. - Complex conditional
case [1,2,5]: ...would testval ~~ [1,2,5], which I suggest test whethervaloccurs in list[1,2,5].
Switch+smartmatch can support arbitrarily complex conditions using only Wren code. Programmers can define classes that implement ~~(_) and encapsulate the conditions into those classes.
Smartmatch is a great complement to switch statements. I think it would be useful even if switch were not added to Wren. However, if you disagree, I certainly understand.
Suggested implementations for ~~
A starting point for discussion.
- Object: same as
Object.==(_). This also serves forBool,Fiber,Null,Num, editString, andSystem, and for optionalMetaandRandom. - Class:
x ~~ SomeClass===x is SomeClass - Fn:
x ~~ fn===fn.call(x). This allows functions to be used for complex tests.- E.g.,
val ~~ Fn.new {|v| v>0 && v<=100 && v%2}to test ifvalis an odd number between 1 and 99
- E.g.,
- Map:
x ~~ map===map.containsKey(x) - Sequence:
x ~~ seq===seq.contains(x)(element membership). This also serves forListandRange. - edit I suggest equality testing for Strings also. (original was: ~~String:
x ~~ str===str.contains(x)(substring test)~~; see https://github.com/wren-lang/wren/issues/989#issuecomment-830723064)
Edit per discussion below, adding !~ which is just like ~~, but with the opposite result. I recommend that Fn.!~(_) throw, since I don't know right now what that would mean.
Implementation in the VM
I would add a CALL_SWAPPED opcode per https://github.com/wren-lang/wren/issues/968#issuecomment-820196805 . The same as regular CALL, but it takes the arguments in the opposite order. That would permit ~~ to be implemented without having to juggle the stack. However, that is only one of many possible options.
Thank you for reading all the way to the bottom :D .
After some reflection on the matter, I've come to the conclusion that a smart-match operator would be a powerful idea and, although you'd need to remember how the built-in classes would behave, the behavior is intuitive anyway and shouldn't be difficult to grok.
I also agree that this would be very useful from a pattern matching perspective if switch is introduced.
The only thing I'm not fond of is the ~~ operator itself which looks odd to me.
I wonder if we could get away with just using a single ~ as the existing use of the tilde is as a unary rather than a binary operator and we have the precedent of doing the same for the - operator (and if #986 is accepted the + operator) without apparently anyone being too confused.
An alternative would be to use some other symbol such as @ or $ which are unused at present though we might want to keep these in reserve for possible future uses.
There is some precedence somewhere also with the operator ~= at least on lua and probably other. But considering the wanted usage, it looks odd...
I still have some reservations, I need to see it in action and its implementation.
~= in Lua appears to be the equivalent of != in Wren. See here.
That would fit in with your own proposal #985 to allow ~ as an alternative to ! for Bool operations.
~= does not really make any sense as an assignment operator, because ~ as no meaning as a binary operator (as for !), and because of the nature of it I don't think it it a good idea at all to allow it...
Well, if ~ were allowed as an alternative to !, then it would make sense to allow ~= as an alternatve to !=.
But you're right that this has nothing to do with compound assignment operators so I've edited my previous post accordingly.
Both ~~ and ~= are fine for me, I only prefer the second one because of the symmetry with the other equality operators.
The biggest reservations I have is about how you declare such method, because of the inversion, I don't find a practical way to express them properly inside the class.
Well, I think if we used ~= as the smart-match operator (and I'd be happy with that) , then it would be better to forget using ~ as an alternative to ! and restrict #985 to just implementing &, | and ^ on the Bool class.
Hmmm don't know what to think. a ~= b would only have some meaning as ~(a == b) as per symmetry with != which should make it strictly equivalent to a != b. So the trivial implementation does not really have a real meaning/benefit.
I'm not very comfortable with the definition of the rules in general and the Object one in particular. It has too much potential meanings, which only depends on the right hand side of the operator contrary to in, and can be a source of error/confusion.
Well, if we do introduce compound assignment operators, then ~= is not going to be one of them because the bit-wise complement operator ~ is unary.
So, I think it would be reasonable to use ~= for smart-matching which you said you preferred to ~~ yourself.
However, to avoid overloading ~ too much, I'd drop the idea of using it as a Bool operator as we don't need it for that purpose anyway.
Binary ~, ~= are fine with me, or =~ for another option.
I thought about =~ but I discarded it because foo=~bar is ambiguous. It can be:
- smartmatch of
fooonbar - assignment of operator
~onbartofoo
@mhermier good point about the possible parse ambiguity.
Re. ~= vs ~(==) https://github.com/wren-lang/wren/issues/989#issuecomment-826318241 --- it's a fair point.
- However, since
~is not logical negation, I don't think the analogy with!=necessarily holds. - There's no risk of confusion with compound assignment since there's no binary
~.
Tilde is nice because it connotes "like". However, if it's too problematic, my next choice would be @, @=, or ::.
@, e.g.,if(foo @ [1,2,3]): Is the LHS "at" (in the same region as) the RHS? And it's a symbol that's not currently used.::, e.g.,if(foo :: [1,2,3])::generally expresses a relationship. The colon can be very valuable, though, so I wouldn't just use a single colon here.- I haven't seen
::used much in other languages outside of BNF and namespacing. Wren doesn't have namespaces, and if it did, I would recommend using.rather than introducing a separate scope-resolution operator.
- I haven't seen
There might also be a case for using $, also currently unused, which is like an S with a vertical bar through it.
The S is suggestive of 'smart' and | is used in some languages as a delimiter in match statements.
TBH, I don't know which I like best.
@cxw42 As it's your idea, I think you should choose :)
To me, because of Smalltalk, @ is the coordinate operator: when put between 2 numbers it produce a Point.
:: is problematic because of C++ which makes it more like variable lookup...
Side note: this is the reason I use logical unary left . to access top level scope on my personal branch ^^
@PureFox48 thanks :) . I did some typing tests to check ergonomics, and I thought of one other option: ~: (the "parrot" operator? :D ). That has the advantage over ~= that (on my keyboard) I don't have to lift the Shift key in the middle of the operator.
My preference would be ~~ first, then ~=, ~:, ::, @=, @, $. I have strong personal associations between $ and variables (e.g., shell vars), which is the only reason I would prefer it least.
@cxw42
Well, as @mhermier doesn't like @ or :: and ~~ is your first preference, let's go with that.
It doesn't really have any technical problems, it will be familiar to those who know Raku and there are plenty of precedents for using a doubled symbol as an operator.
Although it looked a bit odd to me at first, I think I'm beginning to warm to it :)
It is not that I don't like it, it is just that there are strong connotations, that would make a hard learning curve.
As I don't know Smalltalk, the only meaning @ has for me is 'at'.
I agree though that :: wouldn't be a good idea as it will have strong connotations as a scope resolution operator to many people.
I take it you're on board with using ~~, as originally proposed ?
A further thought.
Would it make sense to have a second operator !~ to mean not a match?
I support that, and I doubt much existing code logically negates the result of a bitwise complement :D
I hadn't even realized that something like !~42 was legal before but apparently it is (it returns false) because the Num class is inheriting the ! operator from Object.
I don't think this means that !~ (and for that matter ~~) wouldn't be viable as we'd be using it as a binary operator rather than two successive unary operators.
Incidentally, having a negative match operator would further enhance the attraction which the smart-match operator has compared to in for expressing containment.
Instead of: !(x in [1, 2, 3]) we could simply write x !~ [1, 2, 3].
Also being able to write something like x !~ Num when checking x's type would compensate for not having a negative is operator.
!~~ or !~= not the best elegance but can do if needed.
Off topic: I suspect that this is a sign that the real equality operator is
= and not == as per != shows, following that logic... Even more
proofs with >=, <=... I understand the motivation of C for requiring a
short assignment operator, but you rediscover the inconsistencies by trying
to follow the same logic and it fails...
The way I'm seeing this right now is that ~~ and !~ would be analogous to == and !=.
So the = symbol would be replaced by ~ to reflect the fact that the operator is smart-matching (which may test for containment etc) rather than always testing for equality.
I've gone off using ~= altogether. Even though it can't be, it still looks like it's a compound assignment operator. Also a negative version would need to be something like !~= which is very ugly.
I have a strawman implementation at https://github.com/cxw42/wren/tree/smartmatch if anyone wants to try it! I implemented it using a new SWAP opcode for simplicity.
Example:
class Test {
construct new() {}
~~(needle) { 42 } // Note: `needle ~~ haystack` calls haystack.~~(needle)
}
var test = Test.new()
System.print(1 ~~ test) //> 42
I have not yet added any default implementations but will be working on those.
I proposed at the top for String that x ~~ str be str.contains(x) (substring test). I just realized that won't work well with a switch statement: switch("a") { case "bar"... } shouldn't match just because there is an a in bar. I looked back at the Raku docs, and Raku's string smartmatch is equality rather than substring. For those two reasons, I have modified my https://github.com/wren-lang/wren/issues/989#issue-866834010 to suggest string equality.
While it makes a good start to toy with, but I still don't like the syntax of the declaration in the class. The only writing I see for now, would be something like:
(needle)~~(this) {...}
But that would require to change all unary operators...
I suspect this is because you want to test more to equality than substring (and this should be the same for every container/collection).
I think the String problem is just a symptom that this operator is problematic: is serves too many purposes. If it's the "contained in" operator, then it should refer String.contains(), and not have any implementation for Num, for example. If it's the switch operator, it should perform an equality comparison for strings and be implemented for almost all primitives. The fact that the symbol ~~ has no meaning in math (nor in mainstream languages), also indicates that this is an overly-used operator, so you can't give it a proper name.
Instead, I think we should think about splitting the roles. We can have an in operator, and a case or whatever-called switch match operator. They're similar in the fact that they're both inverted (relative to the other operators), and thus need a CODE_SWAP, but different in purpose.
I find https://github.com/wren-lang/wren/issues/989#issuecomment-830723064 particularly disappointing as I felt that sub-string matching was an important part of this proposal.
I don't think it's necessarily fatal to the original proposal as switch("bar") { case "bar"... } would still have matched even then.
However, @ChayimFriedman2 may be right that it's best to split the roles though, if we split off containment, I'm not sure that this leaves much of a role for smart-matching as we can already do type-checking with is and equality with ==.
As far as containment is concerned, although it was my idea to reuse in and despite objections I still think it's a plausible proposal, I wonder whether it would be better to come up with a new operator instead? I suggest the at symbol, @, might be the best choice of those still available. An advantage of using a symbol rather than a word is that we could then use !@ to mean not contained. Some examples to see how this would look:
var a = 2
var b = a @ [1, 2, 3] // true
var c = a !@ [4, 5, 6] // true
var d = 3
var e = d @ 4..8 // false
var f = d !@ 0..2 // true
var g = "a"
var h = g @ "bar" // true
var i = g !@ "baz" // false
I'm not sure whether I like this or not but I think it's worth considering.
Python uses in, and probably other languages too.
Do you have an example of languages that uses an operator (preferably mainstream)? If not, the cognitive overhead will be probably too much.