pharo
pharo copied to clipboard
Indistinguishable `SortedCollection`s often considered not-equal due to sortBlock
Bug description
SortedCollection includes its sortBlock as part of equality comparison. This is reasonable enough—worth noting that the two sortBlocks have to result in the same sort order or the collections will already be not-equal just from comparing the elements, but just because the two sortBlocks happen to sort these elements the same, that doesn't imply that they will always act the same (and if the collection has 0 or 1 elements, we know nothing at all about the sorting).
However, BlockClosure itself has no implementation of #=, so two SortedCollections created by e.g. a factory method that assigns a sort block will always be not-equal, which is problematic.
To Reproduce
- Most blatantly, evaluate:
(SortedCollection sortBlock: [:a :b | b < a]) = (SortedCollection sortBlock: [:a :b | b < a]) - More realistically (this is how I discovered this), have a method like:
Person class>>lastNameFirstNameSortBlock
^[:a :b | a lastName < b lastName or: [a lastName = b lastName and: [a firstName < b firstName]]]
and then have a codepath that ultimately reduces to (collA asSortedCollection: Person lastNameFirstNameSortBlock) = (collB asSortedCollection: Person lastNameFirstNameSortBlock) (and as it happens collA and collB do have the same contents).
3. Observe false return
Expected behavior
true return for same contents and equivalent/indistinguishable sortBlock.
Possible solutions
The simplest possible solution is to just not compare the sortBlock, assuming that if the contents sort in the same order, the sort blocks are equivalent enough. In most cases this is probably a good assumption—something like a reversed sortBlock will basically always produce a different order, and comparing two SortedCollections drawn from the same set of objects (such that there's any possibility of the contents being the same), with different sorting but that might happen to produce the same order in some cases, and caring that we get a not-equal result in those cases...seems pretty strange to me. The one obvious exception is if both collections are empty or have a single element, where the sortBlock doesn't come into play—but even then, especially in the empty case, it would usually be weird to care that the sortBlock is different. Not always though. (This is the behavior in Dolphin Smalltalk.)
The other possibility is to implement a proper BlockClosure>>=. As it happens we're already partway there—any sortBlock that can be a CleanBlockClosure will already compare as equal if that compiler option is turned on. But I can think of legitimate use-cases for sortBlocks that must be full blocks due to closed-over values. As a somewhat-contrived example, I might want a factory for a sortBlock that goes lastName, firstName, <other property of my choice>, and write sortBlockLastNameFirstNameThen: aSymbol. To make this case work, we would need a proper by-value comparison of blocks. Seems like this consists of:
- The code of the block, for which I think comparing the
compiledBlockis sufficient. This even handles the case of two equivalent blocks written in the same method! - The receiver and any copied values. It looks like copied values which may be mutated later in the method, or in another block, get wrapped in arrays, in which case we would want to compare the array by identity—"this is the same variable" not "it has the same value". Though it may be syntactically required that this be the case anyway, or else the
compiledBlockwould be different—I'm going to assume that this is the case. - Clearly the
outerContextshould be involved somehow, but asContextdoes not (and really cannot) itself implement#=meaningfully this puts us right back where we started. I think it's sufficient to compare it if and only if the block (both of them, as we already know the code is the same) contains a far return—in that case identity is the right comparison, because the question being asked is "do we return to the same place".
Well, having written all that, I figured I might as well go ahead and take a stab at the implementation:
= aBlockClosure
self == aBlockClosure ifTrue: [ ^ true ].
self class = aBlockClosure class ifFalse: [ ^ false ].
self compiledBlock = aBlockClosure compiledBlock ifFalse: [ ^ false ].
self receiver = aBlockClosure receiver ifFalse: [ ^ false ].
"If we have the same compiledBlock, we must have the same number of copied values as well"
1 to: self numCopiedValues do: [ :i |
(self copiedValueAt: i) = (aBlockClosure copiedValueAt: i) ifFalse: [
^ false ] ].
"Directly compare outerContext only if we have a non-local return, as Context uses identity
so will generally *not* be equal. Again, aBlockClosure must have the same answer to
#hasNonLocalReturn as this is a property of the compiledBlock, which we know is equal."
self hasNonLocalReturn ifTrue: [
self outerContext = aBlockClosure outerContext ifFalse: [ ^ false ] ].
^ true
One potential downside of this approach—it's pretty slow: [:a :b | a < b] = [:a :b | a < b] takes about as long as comparing a 30-element OrderedCollection (or the elements of a 30-element SortedCollection), which is far longer than creating the two blocks themselves. Of course, in the identical-clean-blocks case we will early-out much more quickly, and not-equal blocks will generally fail much more quickly as well (e.g. different bytecode lengths). And I would hope that comparing small SortedCollections isn't usually a performance bottleneck...
Version information: Any OS/any Pharo
Expected development cost
The above implementation of #= might be correct...but I definitely want someone with appropriate Deep Lore knowledge to look it over! I can make it a PR if that's helpful but it's very much WIP.
This is an interesting problem... this seems to be one of these cases where in practice, the block is often clean and in that case, comparing should be much easier.
Yes, the clean-block case is the most common for sure, and in that case, barring recompilation of the home method, identity is sufficient. However if one side of the comparison is "long-lived"/was created a while ago, recompilation becomes a real possibility. And as I said, I think the case of a block with (usually just one) copied temp, or which uses the receiver, does have validity here. Consider for instance:
Symbol>>asSortBlock
^[:a :b | (a perform: self) < (b perform: self)]
And we would expect #foo asSortBlock = #foo asSortBlock, but #bar asSortBlock ~= #foo asSortBlock. (Yes, this particular case could be handled by a Symbol>>value:value: implementation directly, but what if we want that to be: value: rcv value: arg1 ^rcv perform: self with: arg1, or simply don't want to implement either form because of the ambiguity?)
I just stumbled across this in a test where two empty SortedCollections compared to not equal. They had the same sort block. I only cared about the contents equality so it took me a bit to figure this out. My guess is that others will stumble upon it, too.
If I had a choice, I would just compare contents of the collections, not the sortBlock.
My 2 cents.