swift-collections `BitSet` API Review

`BitSet` API Review

Open benrimmington opened this issue 2 years ago • 25 comments

I missed the GSoC 2021 topic on the forums, so I'd like to post my feedback here instead.

I'm working on similar types (SetOfUInt8 and SetOfUInt16), so I have a few suggestions for BitSet.

Could the collection APIs be moved to a separate type?
```
extension BitSet {
  public func sorted() -> SortedView
}
```
This method would pre-compute the startIndex and index(before: endIndex) into stored properties of the view.

Could the membership APIs also be combined as a custom subscript?

extension BitSet {
  public subscript<I: FixedWidthInteger>(contains member: I) -> Bool {
    get {
      contains(member)
    }
    set {
      if newValue {
        insert(member)
      } else {
        remove(member)
      }
    }
  }
}

Could the _Word type have OptionSet conformance, for some default implementations?
Should the spaces in source file names be removed?

(There's also an inconsistency with the BItSet+Invariants file name.)
Could some of the tests be extracted into a SetAlgebra conformance checker?

(I haven't reviewed the tests, but I think line 115 is incorrect.)

Mar 27 '22 05:03 benrimmington

Can you provide your reasoning behind these suggestions?

Could the [collection APIs][] be moved to a separate type?
extension BitSet {
  public func sorted() -> SortedView
}
This method would pre-compute the startIndex and index(before: endIndex) into stored properties of the view.

I don't think moving the collection conformance to a view would be reasonable for BitSet. I had to do that for OrderedDictionary to prevent a type conflict with Int keys vs Int indices, but we don't have that design problem here. (In OrderedDictionary's case, after much vacillation, we decided that having integer indices was more important than having the collection conformance on the default view.)

We can certainly precompute startIndex -- the big question there is if that will actually be helpful overall. I already have some doubts that precomputing count is worth the overhead, but to be fair startIndex would be far cheaper.

I don't see a reason to precompute index(before: endIndex), as that is already a constant-time operation.

Could the [membership APIs][] also be combined as a custom subscript?

Sure, it can be. But I see nothing about BitSet in particular that would make this operation particularly relevant to it -- why not have it as a SetAlgebra extension?

Could the [_Word][] type have OptionSet conformance, for some default implementations?

What for? _Word is not a public type, and I don't think it should be made public.

Should the spaces in source file names be removed?

For what reason?

(There's also an inconsistency with the [BItSet+Invariants][] file name.)

Good catch, thanks.

Could some of the tests be extracted into a SetAlgebra [conformance checker][]?

Yep, creating a SetAlgebra checker would be a useful project. Unfortunately, these tests are bumping into problems with combinatorial explosions, and I don't think they can be directly generalized. I am extremely unsatisfied with the idea of withInterestingSets, and I'd like to replace it with some better sampling scheme.

(I haven't reviewed the tests, but I think [line 115][] is incorrect.)

Thanks!

Jun 19 '22 23:06 lorentey

We can certainly precompute startIndex -- the big question there is if that will actually be helpful overall.

I don't understand the benchmark in the original API review. Does the input size refer to the number of elements, or did both sets contain a single element of increasing value?

Could the membership APIs also be combined as a custom subscript?

Sure, it can be. But I see nothing about BitSet in particular that would make this operation particularly relevant to it -- why not have it as a SetAlgebra extension?

This suggestion was partly in response to feedback on the discardable results of the insert, update, and remove methods. A subscript wouldn't need to perform the discardable work, but I'm not sure about proposing a SetAlgebra extension or requirement. It also seems more relevant to bit sets, because of the similarity with using a BitArray subscript.

Should the spaces in source file names be removed?

For what reason?

Some tools don't handle the spaces properly.

Jun 20 '22 18:06 benrimmington

We can certainly precompute startIndex -- the big question there is if that will actually be helpful overall. I don't understand the benchmark in the original API review. Does the input size refer to the number of elements, or did both sets contain a single element of increasing value?

We have not yet done the work of setting up benchmarks for these two data structures -- that's one reason they're still on a branch. (The other is the lack of documentation.) The benchmark results in the original proposal do not reflect the current implementation.

This suggestion was partly in response to feedback on the discardable results of the insert, update, and remove methods. A subscript wouldn't need to perform the discardable work, but I'm not sure about proposing a SetAlgebra extension or requirement. It also seems more relevant to bit sets, because of the similarity with using a BitArray subscript.

Yep, this is a reasonable suggestion, for SetAlgebra. We'll need to quantify the performance benefits of eliminating the return values; if the hypothesis about the perf improvement holds true, and it's significant enough to be worth the effort of going through S-E, then we should consider this for addition to SetAlgebra.

Meanwhile, provided that benchmarks indicate a benefit we cannot achieve by other means (such as making the relevant code inlinable), we can add this subscript to all SetAlgebra implementations within this package. (Currently OrderedSet.UnorderedView and BitSet, with SortedSet and PersistentSet in various stages of readiness. While OrderedSet cannot conform to SetAlgebra, we will want this operation there too.)

(The existence of BitArray's subscript does not, by itself, indicate to me that we also need this operation on BitSet.)

Some tools don't handle the spaces properly.

Which tools?

Jun 20 '22 19:06 lorentey

What do you think about BitSet.insert and similar operations being generic over FixedWidthInteger, rather than simply taking Int values?

(Consider that BitSet is a collection of Ints -- it does not support accessing its elements as any other type. Also consider that BitArray does not provide the same flexibility in its own mutation operations -- BitArray.append is not a generic operation.)

Jun 20 '22 19:06 lorentey

What's your opinion on BitArray's bitPattern initializer?

Should BitArray provide a way to go the other way, i.e., initialize a binary integer based on the contents of a bitarray?

Jun 20 '22 19:06 lorentey

We currently have a BitArray initializer that takes a BitSet, but we don't have an initializer to go the other way. We probably should!

These two types provide two different perspectives on the same underlying data structure. Ideally it ought to be possible to treat them as views of each other, making it easy to switch between the two sets of APIs as needed. However, BitSet does have an additional invariant (_storage.isEmpty || !_storage.last!.isEmpty) that makes the BitArray→BitSet→BitArray round trip lossy. This makes it unlikely we'd want to provide a bitset view on BitArray, although we could have an in-place mutable bitarray view on BitSet.

The goal of the extra invariant is to allow better memory management for BitSet -- shrinking the storage buffer when possible. However, this is currently left unimplemented, as Array.removeLast(_:) never reduces the array's capacity.

One way to unify the invariants of these two types would be to give up on BitSet insisting on the last word being non-empty. This would complicate equality checks & hashing, but it would allow us to transparently move between the two APIs to fit the use case at hand. We can still provide the shrink operation as an explicit public method in this case.

Allowing two equal bitsets to differ in the size of their storage can be somewhat problematic though, as a == b would no longer imply that BitArray(a) == BitArray(b). However, this is no different to String and Double having non-unique representations for the same value.

Jun 20 '22 19:06 lorentey

What about naming? I'm pretty sure bitset & bitarray work best as single words (e.g., C++ has std::bitset, not std::bit_set), but we're currently camelcasing them in their type names.

Jun 20 '22 19:06 lorentey

Yep, this is a reasonable suggestion, for SetAlgebra.

Unfortunately, it's not obvious whether to use a conditional insert(_:) or an unconditional update(with:) operation.

Some tools don't handle the spaces properly.

Which tools?

I've seen reports on the forums, but I can't remember the details. Mishandling of spaces is a recurring issue.

https://github.com/apple/swift/issues?q=is:issue+label:bug+in:title+spaces

What's your opinion on BitArray's bitPattern initializer?

It will be more useful when we have BigInt, StaticBigInt, etc.

Should there be a precondition that the integer value is unsigned or nonnegative?
Should there also be a BitArray.init(words: some Sequence<UInt>) initializer?

Should BitArray provide a way to go the other way, i.e., initialize a binary integer based on the contents of a bitarray?

This reminds me of your SR-7648 feature (which needs an isNegative or isSigned parameter).

I'll think about the other questions.

Jun 21 '22 10:06 benrimmington

These two types provide two different perspectives on the same underlying data structure. Ideally it ought to be possible to treat them as views of each other, making it easy to switch between the two sets of APIs as needed.

Instead of having two top-level types, can one of them become a nested type?

For example, move BitSet into BitArray.IndexSet, and don't allow it to expand or shrink the array.

let count = 256 * 256  //-> 65536
var value = BitArray(repeating: false, count: count)

value.nonzeroBits.isEmpty            //-> true
value.nonzeroBits.insert(count - 1)  //-> (inserted: true, memberAfterInsert: 65535)
value.nonzeroBits.insert(count)      //-> Fatal error: out of bounds.

What do you think about BitSet.insert and similar operations being generic over FixedWidthInteger, rather than simply taking Int values?

~~If the generic operations are useful, can they be added to a SetAlgebra<some FixedWidthInteger> extension?~~

On second thought, I don't like the heterogeneous comparison operators, because they complicate generic code. Therefore, I'm against generic operations on BitSet or SetAlgebra, and I'd prefer to wait for implicit integer widening.

What about naming? I'm pretty sure bitset & bitarray work best as single words (e.g., C++ has std::bitset, not std::bit_set), but we're currently camelcasing them in their type names.

Other libraries use camel case:

D has a BitArray struct, and a lazy bitsSet range.
Java has a BitSet class.

Jun 24 '22 07:06 benrimmington

On second thought, I don't like the heterogeneous comparison operators, because they complicate generic code. Therefore, I'm against generic operations on BitSet or SetAlgebra, and I'd prefer to wait for implicit integer widening.

I'm not sure I follow how heterogeneous comparison operators reflect the ergonomics of inserting sequences of bits represented by fixed-width integers of varying bit widths?

Jun 25 '22 03:06 xwu

I'm not sure I follow how heterogeneous comparison operators reflect the ergonomics of inserting sequences of bits represented by fixed-width integers of varying bit widths?

My argument was badly worded, but I was trying to say that we should wait for a more general solution.

I've just noticed that the generic BitSet.insert(_:) is fulfilling the protocol requirement. I'd wrongly assumed that it was an overload. (The precondition message is incorrect, e.g. UInt(exactly: UInt64.max) on a 32-bit arch.)

Jun 25 '22 04:06 benrimmington

(On providing a labeled subscript in SetAlgebra)

Unfortunately, it's not obvious whether to use a conditional insert(_:) or an unconditional update(with:) operation.

The someSet[member: foo] = true syntax reads like an unconditional assignment, so I would be less surprised if it did update(with:) than insert(_:). However, this is ultimately a minor matter that should be specified in the API docs.

I've seen reports on the forums, but I can't remember the details. Mishandling of spaces is a recurring issue.

Apologies, but my position is going to be quite rude: We are in the year 2022, and this has been a solved problem for many, many decades. Tools that can't handle spaces in file (or directory) names are broken, and they need to be fixed. There are no valid excuses.

Please do file a bug if the human readable file names in this package cause any concrete issues in Swift tooling, so that those tools can be fixed quickly. (The spaces are going to stay, though.)

What's your opinion on BitArray's bitPattern initializer? It will be more useful when we have BigInt, StaticBigInt, etc.

There are multiple implementations for big integers, we just don't have any in the Standard Library. BinaryInteger is a standard protocol, it is okay to write algorithms for it.

Should there be a precondition that the integer value is unsigned or nonnegative?

I don't think so! All signed binary integer types are expected to implement a twos complement representation (or to at least to pretend that they do so in their conversion methods); I think it would be very much desirable to make full use of this fact when converting between integers and bit sequences.

Should there also be a BitArray.init(words: some Sequence<UInt>) initializer?

Oh, yes! We definitely need that -- it's the primitive operation. (We also badly need that as a BinaryInteger requirement.)

These two types provide two different perspectives on the same underlying data structure. Ideally it ought to be possible to treat them as views of each other, making it easy to switch between the two sets of APIs as needed.

Instead of having two top-level types, can one of them become a nested type?

For example, move BitSet into BitArray.IndexSet, and don't allow it to expand or shrink the array.

So here is my thinking about that: from my personal experience, I find BitSet to be far more useful than BitArray, and if we went this way, I'd push very hard to make BitSet the top-level type. (People have been politely but firmly insisting on the primacy of the BitArray view, without (as I had to notice) ever really backing that choice with any concrete use cases or much lived experience.)

But it feels weird to rigidly prefer one or the other -- these are really just two sides of the same coin. Even though I think BitSet is generally more useful, I wouldn't want to force people to spell BitArray as BitSet.ArrayView or some similar silly nested name: there is no inherent hierarchy here, and there are plenty of legitimate cases where the array view is the right choice. (They just seem to be less frequent to me.)

On second thought, I don't like the heterogeneous comparison operators, because they complicate generic code. Therefore, I'm against generic operations on BitSet or SetAlgebra, and I'd prefer to wait for implicit integer widening.

I'm not sure I get this argument. I'm not holding my breath for implicit integer widening, and I don't really see how it would relate to this type -- I would not expect integer widening to allow implicit conversions between signed/unsigned integer types, for example.

I feel a bit ambivalent about the generic methods on BitSet because they seem useful, but they are weirdly lopsided -- they do allow more convenient use when inserting/removing items and when calling contains, but BitSet still requires clients to perform annoying explicit conversions when iterating over the contents. (And this would remain true even if Swift gained implicit integer widening operations -- at least to deal with signedness mismatches.)

The natural alternative would be for BitSet to have Element as a generic type argument, and to forward its implementations to some core, non-generic _BitSet type that deals with UInt elements. However, our current inability to provide default values for type arguments makes this choice somewhat unpalatable: I don't want to force people to spell BitSet<Int> to use this type -- the extra degree of freedom isn't necessarily welcome.

(If we did end up with BitSet<Int>, I'd expect BitArray would also gain a type parameter, specifying its index type. As noted above, I want to avoid making either of these (but especially BitSet) less convenient to use than the other in the overwhelmingly common case when Element/Index is Int.)

Jun 28 '22 21:06 lorentey

What about naming? I'm pretty sure bitset & bitarray work best as single words (e.g., C++ has std::bitset, not std::bit_set), but we're currently camelcasing them in their type names.

Other libraries use camel case:

D has a BitArray struct, and a lazy bitsSet range.

Java has a BitSet class.

That's fair. Most data structure references also prefer "bit vector", "bit array" and "bit set", although "bitmap" is usually spelled as a single word.

Jun 28 '22 21:06 lorentey

Update: on reflection, I found an important use case for member-subscript operations in BitSet, so I went ahead and added them.

extension BitSet {
  public subscript(member member: Int) -> Bool { get set }
  public subscript(members bounds: Range<Int>) -> Slice<BitSet> { get }
  public subscript<R: RangeExpression>(members bounds: R) -> Slice<BitSet> where R.Bound == Int { get }
}

Counter-intuitively, the individual member lookup/assignment operation foo[member: 42] does not actually provide any performance benefit in BitSet's case, as this type has a precalculated count, and so inserting/removing an item through this subscript still needs to check whether the item used to be in the collection.

However, as BitSet is a sorted collection, the corresponding (read-only) slicing subscript provides functionality that would otherwise require far more complicated API surface: among other things, it allows us to elegantly find the closest set member at/above/below a particular integer value.

  /// Accesses the contiguous subrange of the collection’s elements that are
  /// contained within a specific integer range.
  ///
  ///     let bits: BitSet = [2, 5, 6, 8, 9]
  ///     let a = bits[3..<7] // [5, 6]
  ///     let b = bits[4...] // [5, 6, 8, 9]
  ///     let c = bits[..<8] // [2, 5, 6]
  ///
  /// This enables you to easily find the closest set member to any integer
  /// value.
  ///
  ///     let firstMemberNotLessThanFive = bits[5...].first // Optional(6)
  ///     let lastMemberBelowFive = bits[..<5].last // Optional(2)
  ///
  /// - Complexity: Equivalent to two invocations of `index(after:)`.
  public subscript(members bounds: Range<Int>) -> Slice<BitSet> { ... }

(I'm vacillating on whether we should have a variant of BitSet that does not have a pre-calculated count (with the name BitSet.Uncounted or some such); if we do end up adding that, then the individual member-based subscript could become slightly faster for that type than regular insert/remove invocations.)

https://github.com/apple/swift-collections/blob/feature/BitSet/Sources/BitCollections/BitSet/BitSet%2BExtras.swift#L13-L90

Aug 29 '22 23:08 lorentey

I think the Boolean subscript looks better with a contains: label.

bits[contains: member] = true

The other subscripts could have an intersection: label.

let lastMemberBelowFive = bits[intersection: ..<5].last

Line 42 of BitSet+Extras.swift is missing ///.
Lines 72–80 are missing the members: label.

Aug 30 '22 14:08 benrimmington

... and now BitSet no longer keeps a running total of its element count, so the member-subscript is potentially faster. (This also gets rid of the need to update the count after bulk set operations, which was a significant bottleneck.)

Keeping a precalculated count is important for some use cases though, so I added a separate BitSet.Counted type that implements it. (I have a feeling this lily is getting overly gilded at this point, though -- we may well end up only ever shipping the uncounted BitSet (perhaps not even BitArray).)

Aug 31 '22 20:08 lorentey

You've missed a couple of doc fixes — the members: labels, and Optional(5):

  ///     let firstMemberNotLessThanFive = bits[5...].first // Optional(6)
  ///     let lastMemberBelowFive = bits[..<5].last // Optional(2)

However, if the BitSet.Index was Int, then could the Collection subscripts be implemented to work as above, by accepting any range of elements/indices? Would it matter that the generic subscript isn't a protocol requirement? I've read your reasoning behind the opaque index type, but would it be useful to have BitArray (if it survives) and BitSet share their indices?

Sep 01 '22 07:09 benrimmington

For the Codable conformances:

  /// Bit arrays are encoded as an unkeyed container of `UInt` values,
  /// representing the total number of bits in the array, followed by
  /// UInt-sized pieces of the underlying bitmap.

  /// Bit sets are encoded as an unkeyed container of `UInt` values,
  /// representing pieces of the underlying bitmap.

this will make archives incompatible between 32-bit and 64-bit platforms.

Sep 01 '22 18:09 benrimmington

Keeping a precalculated count is important for some use cases though, so I added a separate BitSet.Counted type that implements it.

"Multisets" or "bags" are sometimes called "counted sets", so is there a better name than BitSet.Counted?

Sep 01 '22 19:09 benrimmington

this will make archives incompatible between 32-bit and 64-bit platforms.

D'oh. Very nice catch, thank you! The right thing to do would be to switch to consistently using UInt32 or UInt64 in the encoding forms.

(The obvious thing to do would be to encode bit arrays as strings of the form "1011011", and bit sets as unkeyed containers of their member values: [0, 1, 3, 4, 6]. But I feel efficiency of the encoded form is more important than readability, so I'd prefer something compact that can be encoded/decoded as directly as possible.)

(Edit:) We could also use a compact(ish) string encoding such as (generalized) base64 -- that would have equal overhead whether the serialization target is a text or binary format. However, I don't know how I feel about the idea of adding a base64 encoder/decoder to this package. (Encoding bitmaps as integer values has rather high storage (and time) overhead if the output is a text format like JSON. Then again, I expect compression would fix storage overhead, and the time overhead of decimal conversion might be eclipsed by the need to parse syntax, at least on the decoding side.)

Sep 05 '22 00:09 lorentey

"Multisets" or "bags" are sometimes called "counted sets", so is there a better name than BitSet.Counted?

This is a fair point, but I don't think it'd be a true source of confusion in actual practice. There is no such thing as a "multi bit set" ("bit multiset"? "bitbag"?), and the way the type is nested under BitSet very strongly hints at its variant nature.

(To nitpick, "counted set" and "multiset" aren't interchangeable terms outside abstract mathematics. "Counted set" refers to a particular implementation of a multiset where duplicates aren't preserved -- so it is a more specific term.)

That said, suggestions for naming alternatives would be welcome. (Note: brevity is a virtue.)

Sep 05 '22 00:09 lorentey

However, if the BitSet.Index was Int, then could the Collection subscripts be implemented to work as above, by accepting any range of elements/indices?

The premise here is unacceptable! 😄 If a Collection has integer indices, then its indices must form a contiguous range. LazyFilterCollection in the Standard Library is a disgusting abomination that only serves to illustrate the importance of this rule -- people keep assuming that the indices of array.lazy.filter { ... } work like array indices, leading to an endless source of bugs.

Note: this isn't an explicit Standard Library requirement for the Collection protocol, although the stdlib does assume it holds true for RandomAccessCollection types.

The general convention I'm enforcing is that if a Collection's Index is a Strideable type, then items.index(after: i) must be semantically equivalent to i.advanced(by: 1) (modulo bounds checking and other validation steps). No ifs, no buts -- doing anything else would be far too error-prone.

(A (weaker) corollary of this rule is that collections with stridable indices ought to be random-access collections. We will see if this corollary survives the addition of a tree-based List type.)

Would it matter that the generic subscript isn't a protocol requirement? I've read your reasoning behind the opaque index type, but would it be useful to have BitArray (if it survives) and BitSet share their indices?

I really do not think so. I would be open to allowing easier conversions between BitArray's integer indices and BitSet's opaque indices, but using the same type would be a nonstarter for me. These are two very different collection types, even though they share the same underlying data structure.

However, I do not have much affection for the member labels -- I would be also open to removing them. The label isn't strictly necessary in BitSet's case (nor it is for Set or even the upcoming B-tree based SortedSet) but omitting it could prevent us from having consistent API over similar collection types that we might want to add later. As a trivial example, we wouldn't be able to add an unlabeled subscript(member: Element) operation to OrderedSet, because that might lead to an ambiguity with the indexing subscript, which is semantically a very different operation. (See also the reason why we ended up not declaring OrderedDictionary to be a Collection.)

The range-expression-taking members: subscript is a particularly important operation in any sorted set -- so much so that I'd expect it to be a core requirement in any hypothetical SortedCollection/SortedSet protocol. (FWIW, my original SortedSet implementation provided a rather clumsy set of operations to do this -- using range expressions is a far superior solution.)

I decided to add the members: label so that we have a name that can be consistently used in any sorted [multi]set, including ones where Index can happen to be the same type as Element. However, I don't think this is a pressing issue: the only sorted collection I can think of where this might be a real problem is a [multi]set based on a sorted array, and I don't think sorted arrays are generally useful. (Using search trees such as B-trees is the right way to go, and those will not have integer indices.)

Sep 05 '22 01:09 lorentey

and bit sets as unkeyed containers of their member values: [0, 1, 3, 4, 6].

This would make it possible to encode with BitSet and then decode with SortedSet<Int>.

However, I don't know how I feel about the idea of adding a base64 encoder/decoder to this package.

I dislike the way Base64 reorders the bits. For example, mouse-over the diagram in Fast Base64 encoding with SSE vectorization.

That said, suggestions for naming alternatives would be welcome. (Note: brevity is a virtue.)

I can't think of a better name than BitSet.Counted, but I did wonder if a generic type or property wrapper is possible. For example, I might also want to cache/memoize the number of characters in a string.

Sep 05 '22 12:09 benrimmington

Note: this isn't an explicit Standard Library requirement for the Collection protocol, although the stdlib does assume it holds true for RandomAccessCollection types.

That is only the default implementation when users specify an index type that meets the constraints but don't opt for custom behavior. A type can perfectly well implement index(before:) and index(after:) to use only even integer indices and conform correctly to RandomAccessCollection, unless I'm mistaken.

I do agree this tends to be very confusing to end users though :) The same thing can be said for the bugs that arise when users assume indices begin at 0.

Sep 05 '22 20:09 xwu

Should any of the following types be @frozen?

Line 165 of BitSet+Extras.swift to match the stdlib APIs?

-  public func filter(
+  public __consuming func filter(

Line 347 of Shared/_Word.swift — how can this be tested?

-      let v = try container.decode(UInt.self)
+      let v = try container.decode(UInt64.self)

Sep 13 '22 03:09 benrimmington

This would make it possible to encode with BitSet and then decode with SortedSet<Int>.

Yes, but having equivalent codable representations for distinct types is at best a nice-to-have feature, never a requirement. In the case of BitSet, encoding its contents as a series of its member integers would be wildly less efficient than the current choice. As always, clients who prefer/need a different encoding are very welcome to customize it by defining their own wrapper types.

I can't think of a better name than BitSet.Counted, but I did wonder if a generic type or property wrapper is possible. For example, I might also want to cache/memoize the number of characters in a string.

I think those would be very interesting experiments to do outside of this package!

Note: this isn't an explicit Standard Library requirement for the Collection protocol, although the stdlib does assume it holds true for RandomAccessCollection types.

That is only the default implementation when users specify an index type that meets the constraints but don't opt for custom behavior. A type can perfectly well implement index(before:) and index(after:) to use only even integer indices and conform correctly to RandomAccessCollection, unless I'm mistaken.

I don't really see your point here -- if the stdlib's default implementations only work correctly in cases that satisfy some condition that does not derive from stated requirements, then for all intents and purposes the stdlib does assume that those conditions will hold. These assumptions need to be clearly documented.

(Another case like this is the fact that Slice implicitly assumes that range-replacing operations in the base collection will not invalidate indices preceding the mutated subrange. This does not derive from RangeReplaceableCollection requirements.)

Should any of the following types be @frozen?

Maybe! This package isn't ABI stable, but previously we added @frozen as a courtesy to folks who distribute this package in binary form (as part of their internal build process). The library is technically built with library evolution enabled in these cases, but there are (hopefully) no ABI compatibility expectations whatsoever.

(The presence of the attribute doesn't carry any semantic promise that we won't change the memory layout in future releases -- it just lets these developers to get a calling convention that matches the package's normal behavior.)

Line 165 of BitSet+Extras.swift to match the stdlib APIs?
-  public func filter(
+  public __consuming func filter(

Hm, this is a good question. For Set or a similar generic collection type, the idea with adding __consuming was to prepare for the case where Element is a move-only type -- in which case the filter operation wouldn't be able to make copies of any members; it would need to destroy the original collection instead. (Let's ignore the fact that the implementation wouldn't work for move-only elements anyway -- the stdlib certainly did. 😇)

In BitSet's case, this is moot; I believe the only effect of __consuming would be to change the calling convention when passing self to +1, which would be undesirable here. (It wouldn't make a big difference, but it would often add an unnecessary retain/release pair.)

Line 347 of Shared/_Word.swift — how can this be tested?
-      let v = try container.decode(UInt.self)
+      let v = try container.decode(UInt64.self)

Hm, the MinimalEncoder/Decoder facility ought to catch such type mismatches. We'll need to take a look at what's going on.

Sep 25 '22 01:09 lorentey

I don't really see your point here -- if the stdlib's default implementations only work correctly in cases that satisfy some condition that does not derive from stated requirements, then for all intents and purposes the stdlib does assume that those conditions will hold. These assumptions need to be clearly documented.

That's not what I'm saying. When certain conditions are met regarding the Index associated type, the standard library supplies default implementations of index(before:) and index(after:) for RandomAccessCollection conforming types that advance the index backward or forward by 1; I'm not aware of any places where it (the standard library) assumes or only works correctly when that condition holds, though there are sure to be a great number of instances where end users' code does make that assumption.

Sep 25 '22 01:09 xwu

Sure.

Sep 25 '22 08:09 lorentey

Hm, the MinimalEncoder/Decoder facility ought to catch such type mismatches. We'll need to take a look at what's going on.

Ah, it's all good! This is on a UInt.bitWidth == 32 branch, but it is covered by the existing tests, so it'd been caught by prerelease testing on watchOS devices at least.

Very well spotted! 👍

Sep 25 '22 08:09 lorentey

Can this issue be closed now?

I'll take a closer look at the implementation when BitCollections are reviewed on the forums.

Mar 27 '23 21:03 benrimmington

swift-collections swift-collections copied to clipboard

`BitSet` API Review

swift-collections
swift-collections copied to clipboard