containers icon indicating copy to clipboard operation
containers copied to clipboard

Data.IntMap.size should be O(1)

Open nshepperd opened this issue 4 years ago • 9 comments

The implementation of size for IntMaps is slower than it could be. This tripped me up when order for Data.Graph.Inductive.PatriciaTree turned out to be slower than I expected. The tree type should keep track of the number of inserted elements with an auxilliary variable, in the same way Data.Map does. Alternatively, if this is an intentional design choice, a comment could be added explaining why size needs to be O(n)?

-- | /O(n)/. Number of elements in the map.
--
-- > size empty                                   == 0
-- > size (singleton 1 'a')                       == 1
-- > size (fromList([(1,'a'), (2,'c'), (3,'b')])) == 3
size :: IntMap a -> Int
size = go 0
  where
    go !acc (Bin _ _ l r) = go (go acc l) r
    go acc (Tip _ _) = 1 + acc
    go acc Nil = acc

nshepperd avatar Jan 24 '21 04:01 nshepperd

Tracking the size is not free, especially when considering intersections and unions.

On Sat, Jan 23, 2021, 11:48 PM N Shepperd [email protected] wrote:

The implementation of size for IntMaps is slower than it could be. This tripped me up when order for Data.Graph.Inductive.PatriciaTree turned out to be slower than I expected. The tree type should keep track of the number of inserted elements with an auxilliary variable, in the same way Data.Map does. Alternatively, if this is an intentional design choice, a comment could be added explaining why size needs to be O(n)?

-- | /O(n)/. Number of elements in the map.

-- > size empty == 0 -- > size (singleton 1 'a') == 1 -- > size (fromList([(1,'a'), (2,'c'), (3,'b')])) == 3 size :: IntMap a -> Int size = go 0 where go !acc (Bin _ _ l r) = go (go acc l) r go acc (Tip _ _) = 1 + acc go acc Nil = acc

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/haskell/containers/issues/763, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7MC6G2PJV4KG6JXF43S3ORAFANCNFSM4WQHNOHQ .

treeowl avatar Jan 24 '21 05:01 treeowl

True, but it should not be asymptotically slower.

nshepperd avatar Jan 24 '21 05:01 nshepperd

To accomplish that, you need to increase the size of each node. Basically everything except size/index operations will get slower, and those aren't really the raison d'être of the structure

On Sun, Jan 24, 2021, 12:18 AM N Shepperd [email protected] wrote:

True, but it should not be asymptotically slower.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell/containers/issues/763#issuecomment-766293495, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7OT3PMO3NIFSCFNM7DS3OUUDANCNFSM4WQHNOHQ .

treeowl avatar Jan 24 '21 05:01 treeowl

To accomplish that, you need to increase the size of each node.

I don't think this would necessarily be the case. There is a similar PR in unordered-containers for HashMaps which just stores the size once. But this would require to recompute the size after certain operations, so I don't know if this would be worth it.

benjamin-rdm avatar Jan 24 '21 11:01 benjamin-rdm

Maybe one could add a type IntMapSized which is just an alias to IntMap but with the sized stored at a top level. At least from my point of view, getting size of an IntMap is not something one regularly needs, so sacrificing performance by storing it in the IntMap type is probably a bad idea.

benjamin-rdm avatar Jan 24 '21 11:01 benjamin-rdm

To make things slightly worse, Map.size is O(1) but IntMap.size is O(n)

pepeiborra avatar Apr 11 '21 08:04 pepeiborra

That's not bad. They're just different data structures.

On Sun, Apr 11, 2021, 4:38 AM Pepe Iborra @.***> wrote:

To make things slightly worse, Map.size is O(1) but IntMap.size is O(n)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell/containers/issues/763#issuecomment-817271307, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7LQIAPME3RJO6YWXTLTIFNXLANCNFSM4WQHNOHQ .

treeowl avatar Apr 11 '21 19:04 treeowl

It's somewhat expectation breaking - containers typically has the optimal asymptotics, and it is intuitive to expect IntMap to be a strict performance improvement over Map Int, but making that change can actually slow down your program asymptotically.

With a hackage search I found two packages which had the asymptotic runtime of a function calling IntMap.size incorrectly documented as O(1) (algabraic-graphs, linkedhashmap), as well as seven packages where IntMap.size was called in a context where it would clearly cause unintended quadratic behaviour (commodities-0.2.0.1, data-fix-cse-0.0.3, gf-3.10, homeomorphic-0.1, ipopt-hs-0.5.1.0, lambdacube-core-0.2.0, recommender-als-0.2.1.1)

However, even so I can't really argue with "not many people use this and it will slow other operations down", at least without some sort of benchmark quantifying the cost. These mostly aren't particularly significant packages. (Other libraries like fgl which export functions wrapping IntMap.size might be more momentous but chasing down those reverse dependencies is harder.)

nshepperd avatar Apr 25 '21 05:04 nshepperd

I would be okay with adding extra documentation. Does anyone have a sized version we could link to?

On Sun, Apr 25, 2021, 1:27 AM N Shepperd @.***> wrote:

It's somewhat expectation breaking - containers typically has the optimal asymptotics, and it is intuitive to expect IntMap to be a strict performance improvement over Map Int, but making that change can actually slow down your program asymptotically.

With a hackage search I found two packages which had the asymptotic runtime of a function calling IntMap.size incorrectly documented as O(1) (algabraic-graphs, linkedhashmap), as well as seven packages where IntMap.size was called in a context where it would clearly cause unintended quadratic behaviour (commodities-0.2.0.1, data-fix-cse-0.0.3, gf-3.10, homeomorphic-0.1, ipopt-hs-0.5.1.0, lambdacube-core-0.2.0, recommender-als-0.2.1.1)

However, even so I can't really argue with "not many people use this and it will slow other operations down", at least without some sort of benchmark quantifying the cost. These mostly aren't particularly significant packages. (Other libraries like fgl which export functions wrapping IntMap.size might be more momentous but chasing down those reverse dependencies is harder.)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell/containers/issues/763#issuecomment-826259896, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7L4HWJVSRT5WU2BKC3TKOR3HANCNFSM4WQHNOHQ .

treeowl avatar Apr 25 '21 11:04 treeowl