containers `Int{Map,Set}.compareSize` and associated rules

`Int{Map,Set}.compareSize` and associated rules

Open sjakobi opened this issue 2 years ago • 13 comments

IntMap.size and IntSet.size have complexity O(n). Nevertheless there is a lot of code like

IntMap.size m > CONSTANT

…which could be O(min(n,CONSTANT)) if the IntMap traversal would stop once at most CONSTANT elements have been counted.

For this purpose it would be nice to have a function

compareSize :: IntMap -> Int -> Ordering

and rules that rewrite size comparisons against constants to applications of compareSize.

Similar functionality already exists in bytestring: https://github.com/haskell/bytestring/blob/707e50bac1f1ef7c1729a9246e05f95a4909fd27/Data/ByteString/Lazy.hs#L552-L592

Apr 02 '22 01:04 sjakobi

I think it could be confusing when a rewrite rule changes the complexity of an operation. IMO the better option would be to just expose such a compareSize function.

Apr 02 '22 15:04 konsumlamm

@konsumlamm Not having the rewrite rules would result in fewer programs getting a free speedup out of this change.

Could you clarify where you see the potential confusion and why it would be problematic?

Apr 02 '22 15:04 sjakobi

Rules aren't guaranteed to fire. When they don't, boom. For this one, we may get something like

size m > 5
==>
case $wsize m of
  s -> s ># 5#

Can we rewrite that with rules at all? I'm not optimistic, but it's worth an experiment.

Apr 02 '22 15:04 treeowl

Or, delaying inlining of size, could get

case size m of
  I# s -> s ># 5#

That looks worse, so the best bet is probably writing a size# and seeing if rules can catch it. Try both; not optimistic about either.

Apr 02 '22 15:04 treeowl

Could you clarify where you see the potential confusion and why it would be problematic?

When a refactoring changes the code so that the rules don't fire anymore, the O(1) operation suddenly becomes O(n). Most people won't even be aware of these rewrite rules, so they won't know where the slowdown comes from. I personally also wouldn't expect my code to run in O(1) when it's advertised as O(n).

In general, rewrite rules seem like a really ugly hack to me (you have to be careful to get them to fire and to preserve the behavior of the code) and I hate that they're apparently necessary to get efficient Haskell code.

Apr 02 '22 15:04 konsumlamm

@konsumlamm Yes, the lack of predictability is a downside of rules.

I think the existence of these rules should be documented in the haddocks for size, and there should be a recommendation to use compareSize directly where it applies.

Apr 02 '22 16:04 sjakobi

I think rewrite rules are a horrible way to produce efficient code but they are what is currently used in haskell so I don't see why this issue should be any different on that front.

Apr 02 '22 16:04 Boarders

Alternative (late April fool's) solution: make a function genericSize :: Num a => IntSet -> a, then use it for a = Peano numbers, where the naive implementations of + and < are lazy.

In fact, we can have something like this already today (where s is defined below)

ghci> 10 < I.size s
True
(0.03 secs, 68,016 bytes)

ghci> not $ null $ drop 10 $ I.toList s    -- is equivalent
True
(0.00 secs, 71,344 bytes)   -- and faster

How bad is this in practice? [EDIT - "this" = "performance of I.size", the topic of this issue]

ghci> :set +s
ghci> import qualified Data.IntSet as I
(0.00 secs, 0 bytes)
ghci> let s = I.fromList [1..10^8]
(0.00 secs, 30,888 bytes)
ghci> I.size s
100000000
(8.27 secs, 50,177,957,824 bytes)
ghci> I.size s
100000000
(0.03 secs, 39,480 bytes)   <== we are talking about this

Apr 02 '22 17:04 jwaldmann

In text-2.0 I’m experimenting with measureOff, which is roughly a combination of take and length: for a container a function measureOff :: Word -> a -> Either Word a returns Right (take n a) if there are at least n elements in a and Left (n - length a) otherwise (how many elements are left to take?).

Apr 02 '22 19:04 Bodigrim

@Bodigrim, that looks interesting. The type you're mentioning is a bit different from the one released in text-2.0:

measureOff :: Int -> Text -> Int

Are there any libraries using this function or something similar? I can't find much in text itself.

Apr 02 '22 20:04 sjakobi

I don't think there is any prior art for measureOff.

(https://hackage.haskell.org/package/text-2.0/docs/Data-Text.html#v:measureOff is very low-level for efficiency purposes, it uses positive numbers to signal Right and negative to signal Left)

Apr 03 '22 19:04 Bodigrim

size m > 5
==>
case $wsize m of
  s -> s ># 5#

Can we rewrite that with rules at all? I'm not optimistic, but it's worth an experiment.

As you said yourself the only option is delaying the inlining of size. But a user might end up writing code like:

mySetSize = Set.size

And then the rule can break again. The core issue is that rules can't match on workers and rules on the original function won't match on the worker either. #20364 has more details for the so inclined.

Apr 21 '22 14:04 AndreasPK

@AndreasPK , I guess we could try delaying that inlining and see if we can catch it between specializing the comparison to Int and whenever the case split comes in and makes things hard.

Apr 21 '22 15:04 treeowl

containers containers copied to clipboard

`Int{Map,Set}.compareSize` and associated rules

containers
containers copied to clipboard