Unboxed sums for performance (Discussion)
It would be useful for us to have a general discussion on places where unboxing and similar techniques might have any benefit. For example there is an old issue here: https://github.com/haskell/containers/pull/523 which tries to use them for alter from IntMap. In general I would like to gather a list of possible places where we think they could be beneficial so I (or others) can investigate the performance implications.
I think lookup for IntMap is a pretty obvious spot. Consider this implementation of member using lookup:
member :: Int -> IntMap a -> Bool
member i m = case lookup i m of
Just _ -> True
Nothing -> False
This produces the following Core:
member :: forall a. Int -> IntMap a -> Bool
[GblId,
Arity=2,
Caf=NoCafRefs,
Str=<S,1*U(U)><S,1*U>,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=2,unsat_ok=True,boring_ok=False)
Tmpl= \ (@ a_aDp)
(i_aCM [Occ=Once!] :: Int)
(m_aCN [Occ=Once] :: IntMap a_aDp) ->
case i_aCM of { GHC.Types.I# ww1_aHF [Occ=Once] ->
case Data.IntMap.Internal.$wlookup @ a_aDp ww1_aHF m_aCN of {
Nothing -> GHC.Types.False;
Just _ [Occ=Dead] -> GHC.Types.True
}
}}]
member
= \ (@ a_aDp) (i_aCM :: Int) (m_aCN :: IntMap a_aDp) ->
case i_aCM of { GHC.Types.I# ww1_aHF ->
case Data.IntMap.Internal.$wlookup @ a_aDp ww1_aHF m_aCN of {
Nothing -> GHC.Types.False;
Just ds_dHV -> GHC.Types.True
}
}
We allocate a Just constructor on every successful lookup, even when we immediately deconstruct it. We could rewrite lookup thus:
lookup :: Key -> IntMap a -> Maybe a
lookup k m = case lookup# k m of
(# (##) | #) -> Nothing
(# | a #) -> Just a
lookup# :: Key -> IntMap a -> (# (##) | a #)
lookup# !k (Bin p m l r)
| zero k m = lookup# k l
| otherwise = lookup# k r
lookup# k (Tip kx x)
| k == kx = (# | x #)
| otherwise = (# (##) | #)
lookup# _ Nil = (# (##) | #)
lookup will generally inline away, leaving the non-allocating lookup#. That certainly works for member. We just have to verify that this actually improves performance.