swift-collections icon indicating copy to clipboard operation
swift-collections copied to clipboard

Counted Set

Open ladd opened this issue 3 years ago • 3 comments

How about a counted set/multiset type? These are pretty commonly used, and can be quite handy:

https://developer.apple.com/documentation/foundation/nscountedset https://developer.apple.com/documentation/corefoundation/cfbag-s1l

ladd avatar Apr 12 '21 18:04 ladd

Yep -- counted multisets are useful, and they would be an interesting API design exercise!

I think an unordered counted multiset that stores its contents in a Dictionary might be a relatively straightforward addition. (I wonder if they can be implemented better than just a dictionary mapping set elements to their multiplicities -- perhaps there is some optimization opportunity there, too.)

An ordered multiset would likely not be counting -- it would need to preserve duplicate items so that it can keep them in the desired order. This would be much like OrderedSet except without the uniqueness requirement. (I.e., the benefit over Array would be that it provided a fast way to find all instances of particular items.) Unfortunately, our existing hash table implementations can't handle duplicate keys well, so we'd need to switch away from open addressing with linear probing. (Which isn't necessarily a big deal, but it does lead to concerns about memory allocation overhead & general performance.)

We already have plans to use the B-tree implementation that we develop in #1 to add a sorted multiset type. This too would keep duplicates as is, rather than counting them.

lorentey avatar Apr 14 '21 01:04 lorentey

Part of the API design challenge is to figure out how to expose the multiplicities while also keeping the interface familiar to Swift users. (SetAlgebra definitely can't handle multisets, but its high-level operations translate pretty well, I think.)

API design-wise, I expect a counted multiset will have a very different interface than a dupe-preserving one. For example, would CountedUnorderedMultiset be a Collection? If so, what would be its Element type?

lorentey avatar Apr 14 '21 01:04 lorentey

As CountedSet (aka CountedUnorderedMultiset) is conceptually a Dictionary<Key, Int>, I think it would make sense to have an equivalent Element type: (Key, Int). Similarly, I think it would be OK for CountedSet to conform to Collection like Dictionary. However, arguably Dictionary's direct collection conformance is a mistake, so maybe CountedSet shouldn't follow its precedent there (we could vend a collection view instead).

kylemacomber avatar Apr 14 '21 13:04 kylemacomber