containers Imprecision about the identity of keys (when `Eq` is not the smallest reflexive relation)

This issue is related to #290, #291, #52 (that one found by github Related Issues). There are situations where one would distinguish equal keys (in the sense of Eq) and identical keys (in the extreme pointer-identity).

The following is a (unrealistic) mock scenario, but it exemplifies the point. Here Eq on keys is very generous:

import Data.Map (Map)
import qualified Data.Map as Map

newtype Key = Key { theKey :: Integer }
  deriving (Show)

instance Eq Key where
  _ == _ = True

instance Ord Key where
  compare _ _ = EQ

test = show
     $ Map.update (const $ Just "bar") (Key 2)
     $ Map.singleton (Key 1) "foo"

What do we expect test to be? What does the documentation predict? Lets read the documentation for Map.update! https://github.com/haskell/containers/blob/f7273d15adcc9cac1b3853806cae9df056c9b9f3/Data/Map/Internal.hs#L1062 https://github.com/haskell/containers/blob/f7273d15adcc9cac1b3853806cae9df056c9b9f3/Data/Map/Internal.hs#L1053-L1055 According to the documentation "... is (Just y), the key k is bound to the new value y", we expect the key k = Key 2 to be bound to "bar". However, test is:

fromList [(Key {theKey = 1},"bar")]

This means the documentation is imprecise (wrong is a hard word). The precise wording would be "...the existing key (which is == k) is bound to the new value y".

On a more general note, the API for containers prevents the user to conveniently and efficiently inspect and update the keys of finite sets/maps. There seems to be a hidden assumption that no complex data structures are used as keys, and key equality is always uninteresting. This excludes practical scenarios where one would want to use finite maps also as managing data structure for the keys.

In case you wonder about practical scenario: My own scenario is LALR parser generation where I maintain a map m from parse states (keys) to state numbers (values). A parse state is itself a map from parse items (dotted grammar rules) to lookaheads (token sets). LALR fuses parse states that only differ in the lookaheads, thus, for the sake of m I use an equality on parse states that ignores the lookaheads. (Eq is (==) ``on`` keysSet.) However, in the end I want the parse state with the largest lookaheads, thus, I need to update a key of m if I have a version of that key with larger lookaheads.

Jan 06 '19 19:01 andreasabel

Can you point to an example where properties of an API using Eq or Ord are specified even for the case that == (or <=) is not antisymmetric? It is usually avoided - with (implicit) reference to the Haskell standard saying "The Ord class is used for totally ordered datatypes." https://www.haskell.org/onlinereport/haskell2010/haskellch6.html#x13-1270006.3

But the someone will use such non-standard instances, and this will invariably lead to confusion, e.g., https://mail.haskell.org/pipermail/libraries/2018-December/029299.html (this is an instance of Hyrum's law http://www.hyrumslaw.com/ )

"update keys": in general, this would require re-balancing, while updating the value never does. Or do you want to restrict key updates to cases that don't re-balance (because the new key compares EQ to the old one)? How do you want to enforce this?

(Last I checked, implementation (Data.Map) only uses compare, never ==. But that's orthogonal to what you are saying.)

Jan 06 '19 22:01 jwaldmann

#316 seems related.

Jul 15 '20 05:07 sjakobi

I've been thinking about this recently due to #1118, and I think we should address this properly.

Before going further, note that Eq docs state

The Haskell Report defines no laws for Eq. However, instances are encouraged to follow these properties: [...] Extensionality if x == y = True and f is a function whose return type is an instance of Eq, then f x == f y = True [...]

So such instances might be useful but they risk unexpected behavior from any function that expects this property to hold.

The current status of Set and Map is that some functions specify which of multiple equal keys will be selected, and other functions leave it silently unspecified. #1127 explicitly documents that the choice is unspecified for some Set functions.

The current status is unsatisfactory, and I see three options for improving it.

A. Proper support for such keys

This means that we will not only have lookup :: Ord k => k -> Map k a -> Maybe a, but lookup' :: Ord k => k -> Map k a -> Maybe (k, a) because we may want to retrieve the stored key. Likewise, we will need

update' :: Ord k => (a -> Maybe (Bool, a)) -> k -> Map k a -> Map k a - the Bool indicates which key to keep
insert' :: Ord k => k -> a -> Map k a - unlike insert, keeps the original key in the returned map
unionWithKey' :: Ord k => (k -> a -> k -> a -> (Bool, a)) -> Map k a -> Map k a -> Map k a - the combining function takes both keys and returns a Bool indicating which to keep

and more. For Set we will need

lookup' :: Ord a => a -> Set a -> Maybe a (#290)
insert' :: Ord a => (a -> Bool) -> a -> Set a -> Set a - the Bool indicates whether to replace the key
union' :: Ord a => (a -> a -> Bool) -> Set a -> Set a -> Set a - the Bool indicates which key to keep

and so on.

As you can tell, this will explode the already large APIs of Set and Map. It increases the maintenance burden, and it risks confusing the majority of users who don't need this extra power. So I am not in favor of this option.

B. Specify the behavior of functions on such keys

Some functions already do this, this is about doing the same for the rest.

For instance update will specify that it will put the old key in the new map, not the new.

This option is neither here nor there. You will know exactly what happens but what happens may not be what you want. Then we might get requests to add the behavior one wants, which we won't because we are not doing option A.

C. Document that the behavior on such keys is unspecified across the library

This is the most reasonable option to me. It keeps the library in its current scope and manageable.

The few functions that specify a particular behavior today will not be changed, unless there is good reason to do so. For instance, insert :: Ord a => a -> Set a -> Set a specifies that they key will be replaced. If we drop this requirement, we can make it allocation-free to insert a key that is already present by returning the same Set. Of course, this is a breaking change and needs a major release.

Now it is true that such keys can be useful in practice and it is also true that the Set and Map data structures can offer proper support for such keys. In my opinion, this would be best implemented in a separate library, using containers internals, for the rare cases where one needs it. I am happy to advise if someone wants to make this library.

@treeowl @Lysxia if you have opinions on this, let me know.

May 01 '25 15:05 meooow25

Since I don't see any objections, the next steps would be

[ ] Add the necessary documentation. This could tie in with #553.
[ ] (Breaking!) Relax restrictions on functions (like insert) where it will be beneficial
[ ] (Breaking!) Deprecate Map.argSet and Map.fromArgSet, because they endorse the usage of such problematic keys

Luckily the argSet functions seem to be barely used in the wild. There is exactly one library on Hackage which uses it (https://hackage-search.serokell.io/?q=%5C.%28fromA%7Ca%29rgSet%5Cb), which is here and here, and these can switch to toList+fromList with no major effect on performance.

The breaking changes will have to wait for the next major release though, which won't be any time soon.

May 18 '25 14:05 meooow25