containers
containers copied to clipboard
`Int{Map,Set}.compareSize` and associated rules
IntMap.size and IntSet.size have complexity O(n). Nevertheless there is a lot of code like
IntMap.size m > CONSTANT
…which could be O(min(n,CONSTANT)) if the IntMap traversal would stop once at most CONSTANT elements have been counted.
For this purpose it would be nice to have a function
compareSize :: IntMap -> Int -> Ordering
and rules that rewrite size comparisons against constants to applications of compareSize.
Similar functionality already exists in bytestring: https://github.com/haskell/bytestring/blob/707e50bac1f1ef7c1729a9246e05f95a4909fd27/Data/ByteString/Lazy.hs#L552-L592
I think it could be confusing when a rewrite rule changes the complexity of an operation. IMO the better option would be to just expose such a compareSize function.
@konsumlamm Not having the rewrite rules would result in fewer programs getting a free speedup out of this change.
Could you clarify where you see the potential confusion and why it would be problematic?
Rules aren't guaranteed to fire. When they don't, boom. For this one, we may get something like
size m > 5
==>
case $wsize m of
s -> s ># 5#
Can we rewrite that with rules at all? I'm not optimistic, but it's worth an experiment.
Or, delaying inlining of size, could get
case size m of
I# s -> s ># 5#
That looks worse, so the best bet is probably writing a size# and seeing if rules can catch it. Try both; not optimistic about either.
Could you clarify where you see the potential confusion and why it would be problematic?
When a refactoring changes the code so that the rules don't fire anymore, the O(1) operation suddenly becomes O(n). Most people won't even be aware of these rewrite rules, so they won't know where the slowdown comes from. I personally also wouldn't expect my code to run in O(1) when it's advertised as O(n).
In general, rewrite rules seem like a really ugly hack to me (you have to be careful to get them to fire and to preserve the behavior of the code) and I hate that they're apparently necessary to get efficient Haskell code.
@konsumlamm Yes, the lack of predictability is a downside of rules.
I think the existence of these rules should be documented in the haddocks for size, and there should be a recommendation to use compareSize directly where it applies.
I think rewrite rules are a horrible way to produce efficient code but they are what is currently used in haskell so I don't see why this issue should be any different on that front.
Alternative (late April fool's) solution: make a function genericSize :: Num a => IntSet -> a, then use it for a = Peano numbers, where the naive implementations of + and < are lazy.
In fact, we can have something like this already today (where s is defined below)
ghci> 10 < I.size s
True
(0.03 secs, 68,016 bytes)
ghci> not $ null $ drop 10 $ I.toList s -- is equivalent
True
(0.00 secs, 71,344 bytes) -- and faster
How bad is this in practice? [EDIT - "this" = "performance of I.size", the topic of this issue]
ghci> :set +s
ghci> import qualified Data.IntSet as I
(0.00 secs, 0 bytes)
ghci> let s = I.fromList [1..10^8]
(0.00 secs, 30,888 bytes)
ghci> I.size s
100000000
(8.27 secs, 50,177,957,824 bytes)
ghci> I.size s
100000000
(0.03 secs, 39,480 bytes) <== we are talking about this
In text-2.0 I’m experimenting with measureOff, which is roughly a combination of take and length: for a container a function measureOff :: Word -> a -> Either Word a returns Right (take n a) if there are at least n elements in a and Left (n - length a) otherwise (how many elements are left to take?).
@Bodigrim, that looks interesting. The type you're mentioning is a bit different from the one released in text-2.0:
measureOff :: Int -> Text -> Int
Are there any libraries using this function or something similar? I can't find much in text itself.
I don't think there is any prior art for measureOff.
(https://hackage.haskell.org/package/text-2.0/docs/Data-Text.html#v:measureOff is very low-level for efficiency purposes, it uses positive numbers to signal Right and negative to signal Left)
size m > 5
==>
case $wsize m of
s -> s ># 5#
Can we rewrite that with rules at all? I'm not optimistic, but it's worth an experiment.
As you said yourself the only option is delaying the inlining of size. But a user might end up writing code like:
mySetSize = Set.size
And then the rule can break again. The core issue is that rules can't match on workers and rules on the original function won't match on the worker either. #20364 has more details for the so inclined.
@AndreasPK , I guess we could try delaying that inlining and see if we can catch it between specializing the comparison to Int and whenever the case split comes in and makes things hard.