swift-collections
swift-collections copied to clipboard
Counted Set
Description
A new implicit data structure has been added: an unsigned multiset. This has been discussed extensively on the Swift Forums, most notably here and here.
Detailed Design
Since this is an implicit data structure, I have introduced conformance to RawRepresentable
, where the RawValue
is Dictionary<Element, UInt>
. I have used UInt
instead of Int
to make it clearer that this is not a signed multiset, which may be added in the future.
A potential point of controversy: I have implemented CountedSet.underestimatedCount
as the number of unique elements, since that is indeed guaranteed to be less than or equal to its cardinality.
CountedSet
implements both Collection
and SetAlgebra
, among other protocols.
This is not quite ready for release yet, in my estimation, but the contribution guidelines encourage early pull requests, and the core functionality has been implemented.
Work needs to be done on the Codable
implementation: in particular, it should be possible to decode a flat list of objects into a CountedSet
, as one of the intended uses of the structure is to act as an unordered Array
.
Representations for Mirror, String, and debugging also need to be customized.
Additional API may be warranted as well, since a lot of functionality (like the count of a given element) is currently accessible only by reading the underlying dictionary directly.
/// An unordered, counted multiset.
@frozen
public struct CountedSet<Element: Hashable>: RawRepresentable {
// Allows internal setter to be referenced from inlined code
@usableFromInline
internal var _storage = RawValue()
@inlinable @inline(__always)
public var rawValue: [Element: UInt] { _storage }
@inlinable
public var isEmpty: Bool { rawValue.isEmpty }
/// Creates an empty counted set with preallocated space for at least the
/// specified number of unique elements.
///
/// - Parameter minimumCapacity: The minimum number of elements that the
/// newly created counted set should be able to store without reallocating
/// its storage buffer.
@inlinable
public init(minimumCapacity: Int) {
self._storage = .init(minimumCapacity: minimumCapacity)
}
@inlinable
public init?(rawValue: [Element: UInt]) {
guard rawValue.values.allSatisfy({ $0 > .zero }) else { return nil }
_storage = rawValue
}
Documentation
I’ve taken care to document all new API at the symbol-level, particularly where it may be confusing to users. However, some of it may benefit from additional usage examples and guidance.
I have not updated the Documentation folder guides yet.
Testing
There is complete test coverage for the Sequence
and SetAlgebra
implementations, including dedicated tests for each axiom mandated by SetAlgebra
. Coverage for other elements is more spotty, but does exist.
Performance
I have not implemented performance benchmarks yet.
Source Impact
This is a purely additive change, so it does not break any existing API.
Checklist
- [x] I've read the Contribution Guidelines
- [x] My contributions are licensed under the Swift license.
- [x] I've followed the coding style of the rest of the project.
- [ ] I've added tests covering all new code paths my change adds to the project (to the extent possible).
- [ ] I've added benchmarks covering new functionality (if appropriate).
- [x] I've verified that my change does not break any existing tests or introduce unexpected benchmark regressions.
- [ ] I've updated the documentation (if appropriate).
Thank you for the detailed feedback, I really appreciate it!
A proper multiset should provide a direct operation to query the multiplicity of a particular member. I.e., we need a generalization of
contains
that returns an integer instead of a boolean value.
I’ve come up with a few different candidates for how this should be expressed, and I’d like feedback on which to use.
A. CountedMultiset.multiplicity(of:)
B. CountedMultiset.count(of:)
C. CountedMultiset.subscript(_:)
I’m favoring A, personally.
It's nice to use the same term consistently where possible. Since this type is being called a counted multiset, it makes sense to me that we should be able to ask about the count of something.
It's nice to use the same term consistently where possible. Since this type is being called a counted multiset, it makes sense to me that we should be able to ask about the count of something.
That’s true, but Swift usually uses count
to mean cardinality, and that’s not what this would be. I’m concerned it could be easily confused with CountedMultiset.count
, the sum of all multiplicities and the cardinality of the collection.
You raise an excellent point about the name, though: I was using CountedSet
mainly because Foundation’s (terrible) implementation does. @lorentey said it should be named CountedMultiset
instead, but now that I think about it that seems redundant at best and confusing at worst.
I don’t think there is any precedent for the name “counted multiset", and even “counted set" doesn’t seem to show up outside the context of Foundation and derivations upon it. The most common term in academia and other programming languages, by far, is “multiset”, followed by “bag”.
@lorentey: Instead of CountedSet
or CountedMultiset
, how about simply Multiset
? That seems a lot clearer, and lends itself naturally to proposed variations like SortedMultiset
or even SignedMultiset
.