problem-solving
problem-solving copied to clipboard
`div` and `mod` don't convert to Int, unlike everything else
Hello,
there seems to be a very definite principle in Raku that by default, operators try to interpret given data reasonably for the operation one expresses with the operator. There are some operators, however, that stand out of the row and make things break unsurprisingly:
'6' / 2 #works, gives 3
'6' div 2 #errors, cannot be dispatched to Int:D
(In my particular case, I was doing factorization with repeated div= calls - it's faster than /= but it didn't work on the string input that otherwise wouldn't have caused any problems.)
I find that this behavior is inconsistent with all the other Numeric operators but mod - mod also stands out of the row.
Actually, even the gcd and lsm operators coerce to Int - operators that are the most similar both by looks and behavior to div and mod.
I wonder if there is a technical reason for this apparent inconsistency - I have to be honest, even if there is, this seems to go against the overall Raku approach and "marketing" so much that it might be worth revising. From what I know, the implementation change wouldn't be significant or difficult to make - also, it arguably wouldn't make existing code break, only "not-break" (when it was previously expected to).
This is what I found in the source, related to mod:
# NOTE: According to the spec, infix:<mod> is "Not coercive,
# so fails on differing types." Thus no casts here.
But up to my understanding, the comment is about Real casting via .Bridge only.
Either way, I tested .Int coercion for mod and div and this doesn't seem to break any tests. If no objections raised, I'd be in favor of aligning the ops with other arithmetic.
"Not coercive, so fails on differing types." Thus no casts here.
But up to my understanding, the comment is about
Realcasting ...
The code comment may well have been introduced as part of the bridge mechanism work, but the comment it quotes ("Not coercive, so fails on differing types.") is from a design doc commit (repeated for div and mod) by Larry Wall as part of a design clean up introduced by @Larry 13 years ago. It's explicitly general.
But why are those not coercive? There's no reason given.
I did several hours worth of research as part of uncovering an initial tranche of the discussion for this design and documented it in this SO answer.
Please click on the IRC search link to read comments made about it starting in 2006. Click on those individual comments to view the surrounding discussion each time, and pay particular attention to discussions in the minutes/hours/days preceding commits to related design doc or code.
For example, see some relevant IRC discussion in 2010 starting here ("(looking at div, %, and mod) spec"), continuing on through mention of posting Real Operators ("loliblogged"). And also search for "colomon" to see related code and design doc commits.
Or, if you'd rather not spend that time, please consider my recollection that I had concluded it was carefully designed after substantive discussion, with my TL;DR in that SO representing a good distillation of what I thought the design conclusion was, albeit without the underlying rationale (which, iirc, emerged as I read the various discussions).
S03 has this (highlight mine):
Dispatches to the infix:
multi most appropriate to the operand types, returning a value of the same type. Not coercive, so fails on differing types.Thus the lack of coercion seems to be the logical consequence of wanting a return type matching the argument types.
This still sounds rather cryptic.
On the other hand, I have a very simple question: what makes all other operators, with special regards to gcd and lcm, suited for Numeric coercions, but not mod and div? How to make this seem consistent?
For the old design documents: I also followed the trace based on what I could understand from your remarks. The blog belongs to Solomon Foster - he was the one who changed the description from Larry Wall's "status quo" that said "typically returning a value of the same type", not strictly "returning a value of the same type".
From what I gather, div was meant to be to / what eqv is to ==, hence it was called "generic division" at some point. When core members noticed that this concept is just not feasible, it got repurposed as "integer division", hence pulling the remark regarding the return type - this is what I think wasn't completely thoroughly thought out. Actually, I think it's rather problematic that the "no coercion" clause and the "~~typically~~ returns the same type" clause are from contradicting design versions: one was meant for a "generic division" for which "no coercion" made sense, the other was meant for an "integer division" for which the previous reasoning won't apply per se.
The operators gcd and lcm got added later, thus it's harder to compare the design directly. I can only repeat that I for one see no reason why the rationale of gcd and lcm couldn't apply to ˛div and mod, please do point it out if you do.
Overall I think the fundamental design principles should apply since it's noticeable that these operators didn't have a clear purpose in the time of the old design docs you guys are referring to.
I'm somewhat baffled by the same-type restriction, seemingly artificially applied to what was initially considered an integer operator. I mean, if it's integer then let it be just integer. It is quite reasonable and clear. The "same type" rule follows quite naturally then for in-core candidates.
But the "non-coercive operator" rule is looking contradictory to me. Let's say I have a class which is Int-y in all respects except that it is not inheriting from Int. I.e. it coerces, it plays nicely in all arithmetic operators, etc. Why isn't it allowed to be used with mod and div?
I think the big problem of all previous discussion on the topic is that they all assumed that types mentioned are numeric types leaving aside those which are not but can unambiguously represent a number. Str is one of those. The above mentioned custom class is too. What we can rely upon is that "42".Numeric is Int, "42.13".Numeric is Rat.
Summing up, I'd propose that the default candidates of mod and div would try .Numeric coercion on their operands and then test for the "same type" rule (particular implementation is irrelevant). I.e. we would have:
multi infix:<mod>(\a, \b) {
a.Numeric mod b.Numeric
}
And there be it. Looks like an acceptable compromise solution.
Not to focus too much on this, would I know my input flow might include non-numeric but convertible data then using .Numeric on each value wouldn't bother me not a single bit. So, while generally be in favor of introducing the above candidate, I'm totally ok if the idea is not accepted.
I haven't gone through the material I researched before, but I note that no one has mentioned a key point I recall from that research (though not enough to recall where exactly I saw it, nor even if it's among that material and not somewhere else).
To wit, iirc, a good general purpose language should have a maximal performance integer divide and integer modulus because a lot of algorithms are completely reliant on that for their speed.
Of course, one can always bail to C code, but still.
So perhaps Larry was thinking that ensuring div and mod failed at compile time if they were not given integers would help ensure the code generated for them would be as fast as possible.
Maybe I recall wrongly, or rightly but Larry is wrong, or he's right but wrong to think that making them integer only will one day allow them to be maximal performance, or right about that but wrong to think that worth the anomaly of being non-coercive, or right about that but...
I know Larry dislikes spaces such as GH and I'm with him on that even if I have to date chosen the uncomfortable compromise of holding my nose and using GH. He has commented once or twice on GH in recent years but I suspect he's done with that.
I know he's at a point where he has to conserve his energy for other aspects of life.
I know he is gentle and thoughtful and does his best to be kind toward all (and is inspiringly good at that) and would, I think, only want to engage with those willing to be on their kindest behavior, respectful of his wisdom (even if he wouldn't describe it as such), and paying full attention to what he has to say if he says anything.
With that all out of the way, Larry does occasionally comment on the relevant mailing lists. So if you'd like him to comment on this, perhaps it would be worth posting there.
I question the linked pull request.
The title of this Issue, "div and mod don't convert to Int, unlike everything else" is misleading.
Surely, Perl and Raku both have infix operators that filter their operands? Such as when trying to check numeric equivalence with eq, or string equivalence with == ?
How is it misleading? And an even better question would be: what does it mean that certain operators "filter their operands"? One thing is sure: neither eq, nor == fail on operands that can be coerced into Str and Numeric respectively. Unlike div and mod which do fail on anything that isn't an Int in the first place. And the point is that div and mod are the odd-one-out, since gcd, lcm, numeric + - * / all do coerce.
For @raiph : this could make sense but I personally don't see this logic applied anywhere (else), exactly because + - * are also very fundamental operators for integer-related calculations, and yet they all coerce. Also, there are actual traces of div and mod˙being meant as general purpose equivalents of / and % so I don't think there was an established policy for non-coercive, "performant" operators. Either way, I think consistency would be the most important. I would be happy if div and mod coerced because in my use case, the coercion would only happen once and I would still win performance - but if they don't, then I see no reason for lcm˙and gcd to do coerce.
The code comment may well have been introduced as part of the bridge mechanism work, but the comment it quotes ("Not coercive, so fails on differing types.") is from a design doc commit (repeated for
divandmod) by Larry Wall as part of a design clean up introduced by larry 13 years ago. It's explicitly general.
I suspect @raiph never got a reply on this because he tagged the wrong 'Larry' on Github. His June 7th, 2022 comment should have tagged @TimToady .
The implementation of div converts operands to integers as the problem title describes, unlike mod which leaves them real if they are not integers. See:
say 10.3 / 3.3; # 3.121212
say 10.3 div 3.3; # 3
say 10.3 % 3.3; # 0.4
say 10.3 mod 3.3; # 0.4
I would expect mod to convert the operands to integers first:
10.3 mod 3.3 ≡ 10 mod 3 = 1
It appears that the current implementation is:
| as-is | div | mod |
|---|---|---|
| coerce | 1 | 0 |
| reject | 0 | 0 |
And that there are two sensible alternatives for the desired state:
| to-be v1 | div | mod |
|---|---|---|
| coerce | 0 | 0 |
| reject | 1 | 1 |
| to-be v2 | div | mod |
|---|---|---|
| coerce | 1 | 1 |
| reject | 0 | 0 |
[1] coerce means that the signature becomes
multi sub infix:<div|mod>(Int:D(), Int:D()--> Int:D)
[2] reject means that the signature becomes
multi sub infix:<div|mod>(Int:D,Int:D--> Int:D)
I note that the current gcd and lcm operators take the coercion approach.
However, I think the situation for "do what I mean" is diffferent for div and mod because, unlike gdc and lcm, they have sister Real variants i.e. / and %. So this means that when I write div or mod in my code, I have deliberately decided to have integer math. Typically I am doing something like counting numerals in a 24-hour clock 23:59:59 ... to which I may add or subtract a count of seconds.
So, to second guess the rationale for this
# NOTE: According to the spec, infix:<mod> is "Not coercive, so fails on differing types." Thus no casts here.
and this
S03 has this (highlight mine):
Dispatches to the infix: multi most appropriate to the operand types, returning a value of the same type. Not coercive, so fails on differing types.
My take is that to-be v1 was intended since having a fraction of a second show up in the counter example should be rejected as an error. Why? Because only the coder knows what the true accuracy of their clock is - does the implementation internally store hundredths of seconds (in which case the add should go += .round(0.01) or is it intended to be whole second accuracy in which case the add may typically go += δ.round(1) and it might even be OK to go += δ.Int (i.e. += δ.floor) and meet the specification.
Personally I would use internal accuracy of Rat, but others may choose FatRat, or Num (if they want to count seconds since the big bang). Then I would have some code to go my Int $secs = now.round(1) # now returns e.g. Instant:1712678860.794043723 so has Rat accuracy and use mod to update that to the various digit counters every second.