Imprecision about the identity of keys (when `Eq` is not the smallest reflexive relation)
This issue is related to #290, #291, #52 (that one found by github Related Issues). There are situations where one would distinguish equal keys (in the sense of Eq) and identical keys (in the extreme pointer-identity).
The following is a (unrealistic) mock scenario, but it exemplifies the point. Here Eq on keys is very generous:
import Data.Map (Map)
import qualified Data.Map as Map
newtype Key = Key { theKey :: Integer }
deriving (Show)
instance Eq Key where
_ == _ = True
instance Ord Key where
compare _ _ = EQ
test = show
$ Map.update (const $ Just "bar") (Key 2)
$ Map.singleton (Key 1) "foo"
What do we expect test to be? What does the documentation predict?
Lets read the documentation for Map.update!
https://github.com/haskell/containers/blob/f7273d15adcc9cac1b3853806cae9df056c9b9f3/Data/Map/Internal.hs#L1062
https://github.com/haskell/containers/blob/f7273d15adcc9cac1b3853806cae9df056c9b9f3/Data/Map/Internal.hs#L1053-L1055
According to the documentation "... is (Just y), the key k is bound to the new value y", we expect the key k = Key 2 to be bound to "bar".
However, test is:
fromList [(Key {theKey = 1},"bar")]
This means the documentation is imprecise (wrong is a hard word). The precise wording would be "...the existing key (which is == k) is bound to the new value y".
On a more general note, the API for containers prevents the user to conveniently and efficiently inspect and update the keys of finite sets/maps. There seems to be a hidden assumption that no complex data structures are used as keys, and key equality is always uninteresting. This excludes practical scenarios where one would want to use finite maps also as managing data structure for the keys.
In case you wonder about practical scenario:
My own scenario is LALR parser generation where I maintain a map m from parse states (keys) to state numbers (values). A parse state is itself a map from parse items (dotted grammar rules) to lookaheads (token sets). LALR fuses parse states that only differ in the lookaheads, thus, for the sake of m I use an equality on parse states that ignores the lookaheads. (Eq is (==) ``on`` keysSet.) However, in the end I want the parse state with the largest lookaheads, thus, I need to update a key of m if I have a version of that key with larger lookaheads.
Can you point to an example where properties of an API using Eq or Ord are specified even for the case that == (or <=) is not antisymmetric? It is usually avoided - with (implicit) reference to the Haskell standard saying "The Ord class is used for totally ordered datatypes." https://www.haskell.org/onlinereport/haskell2010/haskellch6.html#x13-1270006.3
But the someone will use such non-standard instances, and this will invariably lead to confusion, e.g., https://mail.haskell.org/pipermail/libraries/2018-December/029299.html (this is an instance of Hyrum's law http://www.hyrumslaw.com/ )
"update keys": in general, this would require re-balancing, while updating the value never does. Or do you want to restrict key updates to cases that don't re-balance (because the new key compares EQ to the old one)? How do you want to enforce this?
(Last I checked, implementation (Data.Map) only uses compare, never ==. But that's orthogonal to what you are saying.)
#316 seems related.
I've been thinking about this recently due to #1118, and I think we should address this properly.
Before going further, note that Eq docs state
The Haskell Report defines no laws for
Eq. However, instances are encouraged to follow these properties: [...] Extensionality ifx == y=Trueandfis a function whose return type is an instance ofEq, thenf x == f y=True[...]
So such instances might be useful but they risk unexpected behavior from any function that expects this property to hold.
The current status of Set and Map is that some functions specify which of multiple equal keys will be selected, and other functions leave it silently unspecified. #1127 explicitly documents that the choice is unspecified for some Set functions.
The current status is unsatisfactory, and I see three options for improving it.
A. Proper support for such keys
This means that we will not only have lookup :: Ord k => k -> Map k a -> Maybe a, but lookup' :: Ord k => k -> Map k a -> Maybe (k, a) because we may want to retrieve the stored key. Likewise, we will need
-
update' :: Ord k => (a -> Maybe (Bool, a)) -> k -> Map k a -> Map k a- theBoolindicates which key to keep -
insert' :: Ord k => k -> a -> Map k a- unlikeinsert, keeps the original key in the returned map -
unionWithKey' :: Ord k => (k -> a -> k -> a -> (Bool, a)) -> Map k a -> Map k a -> Map k a- the combining function takes both keys and returns aBoolindicating which to keep
and more. For Set we will need
-
lookup' :: Ord a => a -> Set a -> Maybe a(#290) -
insert' :: Ord a => (a -> Bool) -> a -> Set a -> Set a- theBoolindicates whether to replace the key -
union' :: Ord a => (a -> a -> Bool) -> Set a -> Set a -> Set a- theBoolindicates which key to keep
and so on.
As you can tell, this will explode the already large APIs of Set and Map. It increases the maintenance burden, and it risks confusing the majority of users who don't need this extra power. So I am not in favor of this option.
B. Specify the behavior of functions on such keys
Some functions already do this, this is about doing the same for the rest.
For instance update will specify that it will put the old key in the new map, not the new.
This option is neither here nor there. You will know exactly what happens but what happens may not be what you want. Then we might get requests to add the behavior one wants, which we won't because we are not doing option A.
C. Document that the behavior on such keys is unspecified across the library
This is the most reasonable option to me. It keeps the library in its current scope and manageable.
The few functions that specify a particular behavior today will not be changed, unless there is good reason to do so. For instance, insert :: Ord a => a -> Set a -> Set a specifies that they key will be replaced. If we drop this requirement, we can make it allocation-free to insert a key that is already present by returning the same Set. Of course, this is a breaking change and needs a major release.
Now it is true that such keys can be useful in practice and it is also true that the Set and Map data structures can offer proper support for such keys. In my opinion, this would be best implemented in a separate library, using containers internals, for the rare cases where one needs it. I am happy to advise if someone wants to make this library.
@treeowl @Lysxia if you have opinions on this, let me know.
Since I don't see any objections, the next steps would be
- [ ] Add the necessary documentation. This could tie in with #553.
- [ ] (Breaking!) Relax restrictions on functions (like
insert) where it will be beneficial - [ ] (Breaking!) Deprecate
Map.argSetandMap.fromArgSet, because they endorse the usage of such problematic keys
Luckily the argSet functions seem to be barely used in the wild. There is exactly one library on Hackage which uses it (https://hackage-search.serokell.io/?q=%5C.%28fromA%7Ca%29rgSet%5Cb), which is here and here, and these can switch to toList+fromList with no major effect on performance.
The breaking changes will have to wait for the next major release though, which won't be any time soon.