swift-collections icon indicating copy to clipboard operation
swift-collections copied to clipboard

Ordered collection with stable indices

Open kyouko-taiga opened this issue 2 years ago • 3 comments

Hello everyone,

I'd like to suggest adding an ordered collection similar to Swift's Array, but with stable indices.

What am I suggesting?

Like many collections, Array stores elements in contiguous storage, for well-known performance reasons. The issue with that strategy, however, is that the index of an element is not stable. By stable, I mean that inserting or removing elements will invalidate indices of existing elements.

var a: Array = ["a", "b", "c"]
let i = a.firstIndex(of: "c")!
a.remove(at: 1)
print(a[i]) // Fatal error: Index out of range

In contrast, a collection with stable indices would guarantee that the index of an element does not change if other elements are inserted or removed.

var a: StableArray = ["a", "b", "c"]
let i = a.firstIndex(of: "c")!
a.remove(at: 1)
print(a[i]) // Prints "c"

How is it useful?

Stable indices can be used to store the identity of a value in another data structure. For instance, one may wish to store some large elements in a dynamically resizable container and refer to them in an adjacency list, to represent a graph connecting these elements.

As a concrete example, I use such a container to represent basic blocks in an LLVM-like IR. Indices keep track of the position of an instruction in its basic block and are stored in various data structures, such as the def-use chain of an instruction. They need to be stable so that inserting a new instruction into a basic block does not change the index of any other instruction.

As a result, basic blocks can adopt (mutable) value semantics. An instruction is merely a tag with an immutable list of operands, uniquely identified by its index in a basic block.

Can we use existing data structures?

We can get close to an ordered collection with stable indices using hash maps.

Using Dictionary

A Dictionary has stable indices but does not preserve the order in which elements have been inserted. This problem could be solved by choosing keys such that they have a total order, but that approach would suffer from two issues:

First, it would require an ad-hoc key generator that is guaranteed to produce unique keys. This generator would also have to keep track of a total order between these keys, should insertion at random position be supported. One trick to solve that latter issue could be to choose keys from a dense range (e.g., rational numbers).

Second, there would be no obvious way to iterate over the dictionary in order, or determine the distance between two (stable) indices, or offset an index by a specific distance. Since the dictionary is unordered, distances would have to be computed using an ad-hoc data structure. The simplest approach would be to sort the dictionary's keys. A more efficient solution would be to have a cache maintained by the generator.

Using OrderedDictionary

OrderedDictionary solves part of the problem by preserving the order in which elements are inserted. In fact, its implementation already covers most of the aforementioned ad-hoc objects required to use Swift's Dictionary. Unfortunately, it still suffers from two issues:

First, as far as I am aware, OrderedDictionary does not support insertion at a random position. It also does not offer a method to retrieve the first/last key of a particular value (i.e., equivalences for firstIndex(of:) and lastIndex(of:) of Swift's ReversibleCollection). As a result, there is no way to compute distances with that data structure alone.

Second, even if the aforementioned feature was actually available, an ad-hoc object would still be required to generate unique keys, just like with the standard Dictionary.

Possible implementation

One possible implementation is to use a doubly linked list in which forward and backward pointers are encoded as offsets from the base address of a memory buffer. That way, the buffer can be copied or resized without invalidating the offsets, which is necessary to implement dynamically resizable containers and copy-on-write, respectively.

An index in that container is simply an offset from the base address of its internal buffer. It is stable, since insertion or removal merely changes the forward/backward pointers of the neighboring elements.

before removal:
  list: (nil, "a", +1) <-> (+0, "b", +2) <-> (+1, "c", nil)
  buffer: [(nil, "a", +1), (+0, "b", +2), (+1, "c", nil)]
remove(at: +1):
  list: (nil, "a", +2) <-> (+0, "c", nil)
  buffer: [(nil, "a", +1), nil, (+0, "c", nil)]

Note: empty "cells" can be used to store the index of the next empty cell, so that space freed after a removal can be reused for the next insertion.

This strategy sacrifies contiguous storage for stable indices and O(1) insertion/removal. An important drawback is that indices are not strideable (in the sense of Swift's Strideable protocol) without access to the container. Therefore, one cannot conform to Collection without having indices hold a pointer to the collection's buffer.

I have an (unprofiled, unoptimized, barely tested) implementation of that strategy here. I would be happy to work on a pull request, should there be any interest in merging an ordered collection with stable indices.

Other implementations

StableVec offers arrays with stable indices in Rust. The implementation is simpler than my proposal, relying on a standard array (or vector in Rust's parlance) instead of a doubly linked list. However, index stability is only preserved for removal, not for insertion.

kyouko-taiga avatar Sep 04 '21 11:09 kyouko-taiga

I'd like to suggest adding an ordered collection similar to Swift's Array, but with stable indices.

This sounds like it might be an interesting addition, if there is a good data structure for it. We need to carefully consider what performance we'd like this container to provide for its basic operations. For example, is there a data structure that would allow it be a RandomAccessCollection?

Stable indices can be used to store the identity of a value in another data structure. For instance, one may wish to store some large elements in a dynamically resizable container and refer to them in an adjacency list, to represent a graph connecting these elements.

It seems to me that the usual practice when one needs a stable index-like thing is to use reference semantics. We have several options for this:

  1. Represent the items with a class type. The class values are then stable references.
  2. Store the items in a dictionary or set with the keys serving as the stable references. This is probably the most flexible option.
  3. Store the items in a slab allocator, like a Swift version of StableVec below. The offsets into the slabs will serve as stable references.
  4. etc

Once we decide on a way to map a stable reference to an actual value in O(1) time (whether through dictionary lookup, pointer dereferencing, or indexing into stable storage), then we can use simple Arrays of these references to represent ordered collections of these. Linear search is more efficient in an Array than a linked list; and insertions in the middle are also quite fast as long as the Array is relatively small, as I'd expect it would be for a list of instructions in a basic block.

OrderedDictionary and OrderedSet are in a sense just self-contained implementations of (2) above.

As a result, basic blocks can adopt (mutable) value semantics. An instruction is merely a tag with an immutable list of operands, uniquely identified by its index in a basic block.

I see, so duplicate instructions are allowed, but the logical position of an instruction is part of its identity within the context of the basic block. In a sense, the block injects additional information to distinguish between individual instructions that isn't captured solely in their individual values.

This feels very similar to the case of a String that implements a conflict-free merge operation: for the merges to work, individual characters need to be uniquely identified in a way that is preserved across insertions/removals -- an operation like "insert 'a' after 'b'" can only be done reliably if we know precisely which 'b' it originally referred to.

This line of thinking leads me to suggest a scheme where a basic block would be an Array of (ID, Instruction) pairs, where the ID is a sequential integer identifier, uniquely identifying each generated instruction. The ID (or a whole pair, given that Instruction is immutable) would serve as the stable index, with O(n) lookups. As long as the basic blocks are relatively small, this would be quite efficient -- it's like using a slab allocator, but without the separate slabs. This scheme requires no new data structure implementation; it's just standard algorithms on a regular Array. (Or two arrays if the ids were to be stored separately from instructions.)

If the blocks can grow to contain many thousands of items, then it may be worth using a more advanced data structure. (For example, a persistent tree-based list (such as a B-tree augmented with subtree counts), would provide O(log(n)) insertions/removals and offsetting, although it has O(n) ID lookups without additional support structures. We have a B-tree implementation (specialized for SortedSet/SortedDictionary), although it hasn't landed yet.)

A Dictionary has stable indices but does not preserve the order in which elements have been inserted.

Dictionary indices aren't stable -- they get invalidated on every removal and every insertion that resizes the hash table. The documentation does explicitly promise that insertions into preexisting capacity do not invalidate indices, though, but that seems like a very fragile thing to rely on. (FWIW, I think this promise is more trouble than it's worth: it disallows interesting hash table optimizations like Robin Hood hashing, for relatively little practical benefit. It's almost never a good idea to hold onto dictionary indices after a mutation (or, for that matter, ever).)

Of course, dictionary keys are very stable, and can be converted to indices in expected O(1) hash/== operations.

First, as far as I am aware, OrderedDictionary does not support insertion at a random position.

Check out updateValue(_:forKey:insertingAt:) and updateValue(forKey:insertingDefault:at:with:)!

Note that these are linear time operations like Array.insert(_:,at:), but they have largish constant factors, because they need to update the cached indices of subsequent (or previous) items in the hash table.

OrderedDictionary optimizes for fast lookups and efficient memory use, in favor of insertions/removals in the middle of the collection.

Second, even if the aforementioned feature was actually available, an ad-hoc object would still be required to generate unique keys, just like with the standard Dictionary.

I'm not sure I fully get this -- it doesn't seem particularly onerous to create one-off types in Swift like the sequential ID type above.

It also does not offer a method to retrieve the first/last key of a particular value (i.e., equivalences for firstIndex(of:) and lastIndex(of:) of Swift's ReversibleCollection). As a result, there is no way to compute distances with that data structure alone.

I don't understand what you mean here. Swift does not currently have a ReversibleCollection, only a ReversedCollection, but that seems unrelated. [Ordered]Dictionary.Values does offer the standard firstIndex(of:)/lastIndex(of:) collection extensions, although these implement a linear search.

One possible implementation is to use a doubly linked list in which forward and backward pointers are encoded as offsets from the base address of a memory buffer. That way, the buffer can be copied or resized without invalidating the offsets, which is necessary to implement dynamically resizable containers and copy-on-write, respectively.

Linked lists aren't entirely out of scope for this package, but they have some major disadvantages that make them undesirable as short term additions. I'm particularly worried about the memory overhead of storing two full-sized integers per each element, and there is also the issue of very slow iteration/navigation performance due to suboptimal reference locality.

I'd prefer if we kept linked list implementations out of this package for now, and see how far we can go without them.

StableVec offers arrays with stable indices in Rust.

That looks like a slab allocator, a.k.a. an array with tombstones. A quick-and-dirty implementation would be to use an array of optionals -- this works pretty well as long as the Element type has an unused bit pattern for representing the nil value. However, tombstones interfere with index offsetting and distance calculations -- they become linear time operations. Therefore, these constructs aren't really intended to be used as Collections directly. (The slab project looks like a more elaborate version of StableVec, and it explicitly calls this out.)

IIUC, the idea with these is generally that you would never want to insert an item "in the middle" of them, or to look up the "nth non-empty item" within their storage -- rather, they are intended to serve only as primitive providers of bulk storage. Other data structures would then contain references to particular items within a slab, using a direct offset that addresses a specific slot within it. So e.g. in the LLVM IR case, a basic block's instructions could be represented by a simple Array of slot numbers within the slab allocator.

A slab allocator could be an interesting addition to this package. The primary issues as I understand it is (1) deciding how exactly to deal with attempts to dereference deleted slots, especially if such slots are allowed to be reused by future insertions; and (2) deciding whether it makes sense for a slab allocator to provide Collection conformance or the usual copy-on-write value semantics. (I suspect not.)

Unlike with reference counting, there is no good compilation- or runtime enforceable way to detect if a slot has outstanding references at the time of its deletion. At minimum, the data structure must be able to reliably detect & trap on references to deleted slots, but it may be okay to return invalid values if the slot has been reused for a different value.

I imagine a simple slab allocator would be represented with an array of fixed-sized(?) buffers with two tail allocated regions in each, holding an allocation bitmap followed by the actual raw storage. (A complication is that ManagedBuffer doesn't support heterogeneous storage.)

lorentey avatar Sep 08 '21 00:09 lorentey

Thanks for your answer!

This sounds like it might be an interesting addition, if there is a good data structure for it.

Great!

This line of thinking leads me to suggest a scheme where a basic block would be an Array of (ID, Instruction) pairs, where the ID is a sequential integer identifier, uniquely identifying each generated instruction. The ID (or a whole pair, given that Instruction is immutable) would serve as the stable index, with O(n) lookups.

That's an interesting idea. I guess it would be an appropriate representation for the specific use case that I presented.

I'm a little worried about the linear search, though. While I don't expect basic blocks to routinely contain thousands of items, lookups at random position will be extremely frequent when I'll navigate through use lists. I also expect insertions and deletions to be relatively frequent during optimization passes. Of course, my concerns are very speculative at this point.

As you suggested, I could maintain an administrative hash table to accelerate lookups into a tree. However, I'm under the impression that switching to a tree-based storage would weaken the strongest argument against a linked list (i.e., slow traversal) while also slowing down insertion/deletion.

Dictionary indices aren't stable -- they get invalidated on every removal and every insertion that resizes the hash table.

Sorry, I shouldn't have said that dictionaries had stable indices, but stable keys (obviously).

Check out updateValue(_:forKey:insertingAt:) and updateValue(forKey:insertingDefault:at:with:)!

Oh, I'm sorry I missed that. I didn't search for the right method names 😅 Most of my objections against OrderedDictionary are moot then.

I don't understand what you mean here.

I think I meant BidirectionalCollection, with an efficient way to get the last index and reverse the collection (hence the "reversible").

For distances, I guess we could search for the index of a key in an OrderedDictionary and look for the next key in O(n). The same for offsetting.

I'm particularly worried about the memory overhead of storing two full-sized integers per each element, [...]

That's a valid concern. One quick and dirty trick that I've implemented is to use half integers to store the forward and backward offsets, assuming that the user is unlikely to insert more than 2^32-1 elements in the list (I used one bit to encode a null pointer). Therefore, a single cell takes 2 words to store a class-bound type.

I guess the same hypothesis does not hold on a 32-bit architecture, where a max capacity of 2^16-1 elements sounds rather limiting.

and there is also the issue of very slow iteration/navigation performance due to suboptimal reference locality.

Reference locality isn't that bad in this particular case (again, quite speculative) because the list does not allocate scattered pointers across the heap; everything is actually in the same buffer. In the best scenario, contents is even laid out contiguously (intermixed with a pair of half integers for each element):

let array = [100, 200, 300]
let list = StableDoublyLinkedList(array)
// list's buffer looks like that: [(-1,1,100),(0,2,200),(1,-1,300)]

Here, each cell is a tuple (prev: UInt32, next: UInt32, elem: Int). If you iterate over the list, then you actually just iterate over a contiguous buffer of integers with a larger stride. Of course, the situation will deteriorate as you insert/delete at random positions.

I'd prefer if we kept linked list implementations out of this package for now, and see how far we can go without them.

I understand.

(1) deciding how exactly to deal with attempts to dereference deleted slots, especially if such slots are allowed to be reused by future insertions; and (2) deciding whether it makes sense for a slab allocator to provide Collection conformance or the usual copy-on-write value semantics. (I suspect not.)

Why couldn't the slab allocator keep track of the liveness of every slot? I would have imagined that it offered a kind of alloc/free API to its users, returning slot indices rather than pointers.

In terms of implementation, your administrative data structure would just have to be slightly larger than a bitmap to either count references, if slots can be retained and released arbitrarily often, or (more likely) mark slots as being used if you assume that the application must designate one user as the "owner".

In the latter case, you can probably implement copy-on-write the usual way and it is simple to cleanly deinitialize your objects (assuming you know their type) if the designated "owner" never shows.

I imagine a simple slab allocator would be represented with an array of fixed-sized(?) buffers with two tail allocated regions in each, holding an allocation bitmap followed by the actual raw storage.

IIUC, I implemented a similar data structure to represent virtual registers for my IR interpreter.

A slab would be a "frame". One is pushed onto a stack every time the interpreter calls a function, representing the memory of the callee's registers. I use a stack, because I always allocate new registers in the most recent frame, but an array of buffers would be just as good.

A lot of the complexities in my implementation are due to the fact that slots (i.e., registers) can contain heterogeneous data. Hence, I have to take care about size and alignment, and I need the user to tell me how to deinitialize a slot. If slabs contain homogeneous data, these problems would simply disappear.

kyouko-taiga avatar Sep 08 '21 08:09 kyouko-taiga

Is this close to what you had in mind for the slab allocator? https://gist.github.com/kyouko-taiga/7b6e58cfc8bc56d43ef520bdf26919b9

  • The collection is implemented as an array of buffers (which I called buckets), laid out contiguously.
  • Indices are stable. Whenever something is removed, the corresponding bucket's allocation bitmap is updated to mark the slot free, allowing it to be reused for another insertion.
  • The same bitmap is used to detect invalid dereferences.
  • Since elements are homogeneous, it's easy to just check the bitmap to deinitialize outstanding elements; there's no need for manual memory management.
  • The whole data structure has (mutable) value semantics, implemented the usual way.

I didn't use ManagedBuffer because I couldn't find a simple way to specify the size of each bucket. I guess one trick could be to define a large tuple. That could remove one level of indirection to reach a particular element. Currently there're 2: one for CoW and one for the "array" of buffers.

Quick and dirty measurements suggest that inserting is as fast as appending in a Swift array with reserved capacity. The same goes for traversal and for removal at random positions.

kyouko-taiga avatar Sep 21 '21 22:09 kyouko-taiga