containers icon indicating copy to clipboard operation
containers copied to clipboard

Enum k => EnumMap k v as generatisation of IntMap v ?

Open jwaldmann opened this issue 9 years ago • 22 comments
trafficstars

Sometimes I want IntMap performance, but Enum k typing for more expressive source code.

This can be done with a heap of boilerplate code: https://gitlab.imn.htwk-leipzig.de/waldmann/pure-matchbox/blob/master/src/Matchbox/Map.hs - and I hope that ghc will make sure it has no run-time cost.

Could (something like) this become part of containers?

Or does this have a totally different solution (with associated types perhaps) that should not be part of containers?

jwaldmann avatar Apr 26 '16 11:04 jwaldmann

I like this idea. Could you please write up a proposal and send it to [email protected]? On Apr 26, 2016 7:14 AM, "jwaldmann" [email protected] wrote:

Sometimes I want IntMap performance, but Enum k typing for more expressive source code.

This can be done with a heap of boilerplate code: https://gitlab.imn.htwk-leipzig.de/waldmann/pure-matchbox/blob/master/src/Matchbox/Map.hs

  • and I hope that ghc will make sure it has no run-time cost.

Could (something like) this become part of containers?

Or does this have a totally different solution (with associated types perhaps) that should not be part of containers?

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/haskell/containers/issues/210

treeowl avatar Apr 26 '16 12:04 treeowl

Will do.

jwaldmann avatar Apr 29 '16 08:04 jwaldmann

I think I have defined such a map before, and I am sure many have.

But I am sceptical about the Enum type class. It happens to have fromEnum and toEnum, but also many other functions that are unrelated to the task at hand. Furthermore, there are instances that a definitely not suitable to be used with such an IntLikeMap, in particular Double. I see two way out:

  • Replace Enum a to Coercible Int a. This would give better performance guarantees, and ensure that no invalid instances are given, and users who have newtype Something = Something Int could use the IntLikeMap with Something keys without additional boiler plate code.
  • Replace Enum a with a new class IntLike a with just methods toIntLike and fromIntLike, and specify that this class is meant for types than can injectively embedded into Int. A bit more expressive and general, but requires more boiler plate code. There could be a default implementation in terms of Coercible, so that for newtypes the instance is simple.

Also note that in some applications one might only have a fooToInt function for keys, and is content with using only the methods of Data.IntMap that consume a key, but not functions like keys or mapWithKey. So maybe, the type class IntLike should have a superclass ToInt with the a -> Int method, and the methods of IntMap using only that functionality should have that constraint?

So the design space is not small. I would advocate putting this into a package of its own with a module name that does not clash with containers, and find good names and test and refine the API and get people to use it, with the plan to possibly merge it into containers if it proves its merit.

nomeata avatar May 01 '16 19:05 nomeata

I like your general IntLike class, but I think the method names are wrong for the same reason I dislike the equivalents in Enum. I'd much prefer toInt and fromInt. Another idea would be to split these:

class ToInt a where toInt :: a -> Int class FromInt a where fromInt :: Int -> a

One reason to split these is the single-method optimization; the other is that some types may only support one or the other.

What laws are you hoping for? Are Word8 and Bool Int-like? On May 1, 2016 3:17 PM, "Joachim Breitner" [email protected] wrote:

I think I have defined such a map before, and I am sure many have.

But I am sceptical about the Enum type class. It happens to have fromEnum and toEnum, but also many other functions that are unrelated to the task at hand. Furthermore, there are instances that a definitely not suitable to be used with such an IntLikeMap, in particular Double. I see two way out:

  • Replace Enum a to Coercible Int a. This would give better performance guarantees, and ensure that no invalid instances are given, and users who have newtype Something = Something Int could use the IntLikeMap with Something keys without additional boiler plate code.
  • Replace Enum a with a new class IntLike a with just methods toIntLike and fromIntLike, and specify that this class is meant for types than can injectively embedded into Int. A bit more expressive and general, but requires more boiler plate code. There could be a default implementation in terms of Coercible, so that for newtypes the instance is simple.

Also note that in some applications one might only have a fooToInt function for keys, and is content with using only the methods of Data.IntMap that consume a key, but not functions like keys or mapWithKey. So maybe, the type class IntLike should have a superclass ToInt with the a -> Int method, and the methods of IntMap using only that functionality should have that constraint?

So the design space is not small. I would advocate putting this into a package of its own with a module name that does not clash with containers, and find good names and test and refine the API and get people to use it, with the plan to possibly merge it into containers if it proves its merit.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/haskell/containers/issues/210#issuecomment-216065449

treeowl avatar May 01 '16 19:05 treeowl

but I think the method names are wrong

I payed no attention to naming yet, so feel free to come up with better ones.

I’m not sure if FromInt makes sense for types that have no ToInt, and if it does not, then ToInt should be a superclass of FromInt.

The law for ToInt would be that toInt is an injective function, i.e. toInt a = toInt b ⟹ a = b.

The law for FromInt would be that fromInt (toInt a) = a. I think it would be ok to have fromInt only defined on the image of toInt, at least for the use case we have in minds.

nomeata avatar May 01 '16 19:05 nomeata

If using new classes in lieu of Enum, another option to consider is the classes from prelude-safeenum. (I'm not saying they're necessarily the right idea here; just another option to consider.)

If all we're doing is converting to and from Int, with injectivity and invertability as the only laws, then why not just use the HashMap of unordered-containers? The key reason to use IntMap with Enum (or similar) is that it maintains the ordering of elements; so we'd also want toInt to preserve relative ordering of the type.

wrengr avatar May 02 '16 03:05 wrengr

"preserve relative ordering" - Yes, that could be important for my application, as it might reduce runtime for intersections, because disjointness of maps can be detected early. There is no shortcut for intersecting hashmaps?

jwaldmann avatar May 02 '16 08:05 jwaldmann

I don’t think that preserving relative ordering (with regard to the Ord instance of the data type) is required for efficient intersection of disjoint IntMaps. The only benefit is that toAscList is as efficient as toList; sometimes nice to have, but probably very important here.

nomeata avatar May 02 '16 08:05 nomeata

Intersection worst case is at least linear in both inputs, even if they represent disjoint sets of keys.

Example: one map has [0,2 .. N] as keys, other map has [1,3 .. N+1] - you certainly need to visit both completely. (Well, unless the root branches on the last bit of the key, but then the example can be modified.)

If there is some order in the tree/trie structure, then we could get lucky in case one key set is [ .. N/2] and the other is [N/2+1 .. ], this can be detected by rightmost_key(a) < leftmost_key(b), and the Data.Map algorithm does this, cf. https://github.com/haskell/containers/issues/177, and I assume Data.IntMap does it as well.

For Hashmaps, there never is such a "lucky case"?

I now made the Map implementation switchable in my benchmark https://gitlab.imn.htwk-leipzig.de/waldmann/containers-benchmark/blob/master/src/Matchbox/Map.hs and it is definitely slower with HashMap, altough I have not profiled this in-depth (would need -auto-all on the libraries).

For the built-in test case (running main with no arguments)

Data.IntMap  :   6.7 sec
Data.Map     :   9.7 sec
Data.HashMap : 200 sec

I am definitely seeing M.intersectionWith as a hotspot (https://gitlab.imn.htwk-leipzig.de/waldmann/containers-benchmark/blob/master/src/Matchbox/Relation.hs#L208)

jwaldmann avatar May 02 '16 14:05 jwaldmann

Note that HashMap does have different performance tradeoffs compared to IntMap just due to the internal representation of the tree. Which is why I'm thinking of adding a non-hashed version of HashMap to a future release (i.e., an AMT (array mapped trie) in lieu of the HAMT (hashed AMT)).

IntMaps are binary trees internally, so they're cheap to build/change but require more pointer chasing to read from. Whereas HashMaps use arrays internally to help flatten the tree, meaning they are far better behaved re cache and pointer chasing, but they're also more expensive to build/change.

wrengr avatar May 04 '16 05:05 wrengr

Yes. I am partially aware of these tradeoffs. The text of your message should be atop the docs for containers.

jwaldmann avatar May 04 '16 09:05 jwaldmann

I simply use the package: https://hackage.haskell.org/package/enummapset

amigalemming avatar Jun 26 '16 05:06 amigalemming

I think that Enum is the right abstraction. It is enough to define fromEnum and toEnum, the other functions are optimizations. instance Enum Double is just wrong, because there is no sensible fromEnum implementation.

amigalemming avatar Jun 26 '16 06:06 amigalemming

I would agree that Enum is most likely conceptually the right abstraction for this. I do however think Enum is in a pretty bad place right now, with the Double/Float instances and the lack of an Ord superclass, plus the inevitably partial succ and pred.

I would be much happier to use it if Haskell/base had a proper ordering hierarchy with Eq => Ord, Ord => Enum, Ord => Bounded and a bunch of other classes like lattices and partial orderings.

tysonzero avatar Dec 05 '18 08:12 tysonzero

Enum can't decide whether it's for things that look sufficiently like Int (e.g., Word8), or for things supporting successors and predecessors (e.g., Integer). I don't think it's the right way.

treeowl avatar Dec 05 '18 14:12 treeowl

IMO Enum should mean graded totally ordered set without the non-negative rank requirement.

So in other words it should be a bijection from the Integers, there should be super-classes that deal with just the graded part or just the pred/succ part.

If the above were the case then I think Enum would make a lot of sense.

I realize that Integer and Int aren't the same, so maybe SmallEnum or something similar would be needed that specifies that Int is big enough for all values to be represented.

tysonzero avatar Dec 06 '18 05:12 tysonzero

  • Replace Enum a to Coercible Int a.

This approach is being used in the wild at https://github.com/ejconlon/int-like/blob/master/src/IntLike/Map.hs

But I would love to see containers' existing data IntMap a migrated to data IntCoercibleMap i a and type IntMap = IntCoercibleMap Int. The containers API is much richer than the one exposed by int-like: more functions, and both strict and lazy versions. It'd be less maintenance overall to define everything once for IntCoercibleMap than to maintain API parity with a newtype wrapper in a different package. And consumers could choose to write functions targeting IntCoercibleMap and they'd be usable with IntMap as well. I think the case is stronger for this than with the Enum version of the proposal, as it's so much less intrusive to coerce the specialized IntMap functions to general IntCoercibleMap ones than it is to thread fromEnum and toEnum everywhere. IntSet too, mutatis mutandis.

I'm not holding my breath or anything; just saying I'd love to see it.

rhendric avatar Aug 19 '23 05:08 rhendric

I wouldn't want that getting in the internals (with the possible exception of traversals). Would you be interested in giving it a go?

treeowl avatar Aug 19 '23 08:08 treeowl

Either your first sentence is agreeing with me that the Coercible approach is better than throwing the Enum functions around, or you're offering a refinement of my proposal about which I need some more detail to properly understand. :smile: Either way, I'd be happy to contribute a PR that moves things forward if you think the idea is worth a shot.

rhendric avatar Aug 19 '23 08:08 rhendric

There is https://hackage.haskell.org/package/enummapset, offering quite extensive API.

Bodigrim avatar Aug 19 '23 08:08 Bodigrim

There is https://hackage.haskell.org/package/enummapset, offering quite extensive API.

That's what I said in 2016: https://github.com/haskell/containers/issues/210#issuecomment-228585953

amigalemming avatar Aug 19 '23 14:08 amigalemming

And large as it is, it still suffers from not being in containers.

‘This is a fully general argument against things not being in core libraries,’ you say? Well, maybe it is. But it's a much stronger argument IMO when the thing we're discussing being in core libraries is identical up to coercion to something that's already in core libraries.

rhendric avatar Aug 19 '23 18:08 rhendric