containers
containers copied to clipboard
Can we de-duplicate Set/Map by leveraging common structure?
Looking at https://github.com/haskell/containers/pull/817 's
argSet :: Map k a -> Set.Set (Arg k a)
argSet Tip = Set.Tip
argSet (Bin sz kx x l r) = Set.Bin sz (Arg kx x) (argSet l) (argSet r)
it seems the two structures are the same "up to parenthesization" I wonder if one could exploit this to de-duplicate a good deal of code.
It's not as simple as
newtype Map k a = Map (Set (Arg k a))
since
type role Map nominal representational
type role Set nominal
causes
src/Data/Map/Internal.hs:465:1-38: error:
• Role mismatch on variable a:
Annotation says representational but role nominal is required
• while checking a role annotation for ‘Map’
|
465 | type role Map nominal representational
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
but maybe we could have the binary tree structure be its own type and then
newtype Set a = Set (BinTree a)
newtype Map k v = Map (BinTree (Arg k v))
allowing a good deal of functions to simply delegate
null :: Set a -> Bool
null (Set t) = BinTree.null t
null :: Map k a -> Bool
null (Map t) = BinTree.null t
There are obvious examples like null, size, member but Set and Map.Lazy may have more similarities - insert seems pretty much the same "up to parenthesization".
It's not obvious to me whether Map can be decomposed like this, since the extra indirection (in this case data Arg a b) interferes with lazyness and performance.
I believe it would be possible with Backpack, though I don't know how to use that. Unfortunately, stack didn't support Backpack last I checked, and I don't know if anyone's really developing it at the moment.
Currently we have
data Set a
= Bin {-# UNPACK #-} !Size !a !(Set a) !(Set a)
| Tip
data Map k a
= Bin {-# UNPACK #-} !Size !k a !(Map k a) !(Map k a)
| Tip
which seems to be the same structure we'd get with
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE UnboxedTuples #-}
{-# LANGUAGE UnliftedNewtypes #-}
import GHC.Exts
data BinTree (a :: TYPE r)
= Bin {-# UNPACK #-} !Size !a !(BinTree a) !(BinTree a)
| Tip
newtype Set a = Set (BinTree a)
newtype Map k v = Map (BinTree (Arg' k v))
newtype Arg' a b = Arg' (# !a, b #)
unfortunately UnliftedNewtypes requires GHC 8.10
I see no newtypes there. As far as I know, we still don't have a story for functions that actually work with that kind of RuntimeRep polymorphism. Even if you can define the type, you can't (AFAIK) define functions to share code across the representations.
I see no newtypes there.
Last 3 lines?
Even if you can define the type, you can't (AFAIK) define functions to share code across the representations.
Actually looks like I can't even define it. Something in my setup was making ghc extra lenient somehow - running ghc directly fails on two places
Main.hs:7:5: error:
• A levity-polymorphic type is not allowed here:
Type: a
Kind: TYPE r
• In the definition of data constructor ‘Bin’
In the data type declaration for ‘BinTree’
|
7 | = Bin {-# UNPACK #-} !Size !a !(BinTree a) !(BinTree a)
Main.hs:13:28: error:
• Unexpected strictness annotation: !a
strictness annotation cannot appear nested inside a type
• In the type ‘(# !a, b #)’
In the definition of data constructor ‘Arg'’
In the newtype declaration for ‘Arg'’
|
13 | newtype Arg' a b = Arg' (# !a, b #)
Deadend it seems
If/when levity polymorphism supports the above, BinTree can probably also be re-used for Data.Dependent.Map.
Though to avoid performance regression from introducing some Data.Some equivalents it might also require first-class existentials to be approved/merged. Something like
newtype DMap k v = DMap (BinTree (exists x. Arg (k x) (v x)))
I believe it would be possible with Backpack, though I don't know how to use that. Unfortunately, stack didn't support Backpack last I checked, and I don't know if anyone's really developing it at the moment.
Nothing much has changed. See https://github.com/commercialhaskell/stack/issues/2540 and https://github.com/commercialhaskell/stack/issues/4745
Actually looks like I can't even define it. Something in my setup was making ghc extra lenient somehow - running ghc directly fails on two places
newtype Arg' a b = Arg' (# !a, b #)
newtypes can't change the semantics of the types they wrap in this way. Here you try to say "make a newtype of a unboxed tuple, but also make the tuple strict in it's first field.
Sadly there is currently also no type that can express the semantics you want here. You would either need strict unboxed tuples: https://gitlab.haskell.org/ghc/ghc/-/issues/17001 (seems unlikely to happen) or one of the things discussed in https://gitlab.haskell.org/ghc/ghc/-/issues/21617 (which I'm working on in my spare time)
In theory GHC could one day support a type like:
data BinTree (a :: TYPE (BoxedRep r))
= Bin {-# UNPACK #-} !Int !a !(BinTree a) !(BinTree a)
But supporting functions which take levity polymorphic arguments is unlikely to happen as GHC often needs/wants to generate different code depending on the liftedness of the argument.
What about doing it the other way?
-- Just like today:
data Map k a
= Bin {-# UNPACK #-} !Size !k a !(Map k a) !(Map k a)
| Tip
-- New: Re-use Map, but with unit as value type
-- a has the role `nominal` as required :-)
newtype Set a = Set (Map a ())
This approach is used in many other programming languages, exactly to improve code-reusage between the two datastructures. Essentially the whole interface of Set can now directly be implemented in terms of Map.
I believe this alternate definition of Set does not differ in strictness of its contents from the current one (though I might have missed something of course), which makes it nicely backwards-compatible.
newtype Set a = Set (Map a ())
This would increase the size of the Set constructor by an additional pointer. Which is probably not worthwhile when the Set code is already written.
What about doing it the other way?
That's how it's done for HashSet / HashMap
https://github.com/haskell-unordered-containers/unordered-containers/blob/259dc9e4191d988a7f84c3d417764a32e7c8e29f/Data/HashSet/Internal.hs#L112-L114
It has the issue pointed above but I also wonder if it's morally "backwards". It's defining the more general thing first then specializing it, whereas newtype Map k v = Map (BinTree (Arg k v)) decomposes into two simpler constructs.
It's easier to obtain multiple shapes (e.g. Map, DMap as defined above) by recombining simple pieces than finding a sufficiently general one that can be specialized to all the shapes.