rhombus-prototype
rhombus-prototype copied to clipboard
First pass at generic collection interfaces
This is a rough draft of a generic collection API for Rhombus, as discussed in #201 and #221. This is nowhere near complete, and I'm only opening this draft pull request so that others can see what I've done so far and offer early feedback.
Maybe this should be deferred to real-time discussion, but my immediate impression is that this is much too stateful. I think List
should mean an immutable value, not something that is potentially a MutableList
. The base iteration protocol should be functional, and not based on stateful iterator objects. To the degree state is involved, I worry that the interaction with thread-level concurrency needs to be clear from the start (i.e., the checks in this draft seem to detect a kind of concurrent modification, but not current in the sense of threads).
I think
List
should mean an immutable value, not something that is potentially aMutableList
.
My original plan was to have List
, MutableList
, and ListView
instead of ImmutableList
, MutableList
, and List
, but I forgot about that. I'll change the names around to reflect the original plan.
The base iteration protocol should be functional, and not based on stateful iterator objects.
I'd like iteration to not require allocation, since it's going to occur in tight loops very frequently. And I'd prefer it not to involve multiple return values, because that's a pretty clunky API to work with in my experience. Given those constraints, perhaps something like this is more in line with what you're imagining:
interface Sequence:
method iterate() :: Iterator
interface Iterator:
method isDone() :: Boolean
method element() // throws if isDone()
// might return the same iterator if iteration reuses a single mutable iterator object for performance reasons
method next() :: Iterator // throws if isDone()
Those seem like a good improvements!
Still...
I'd like iteration to not require allocation, since it's going to occur in tight loops very frequently.
That seems like a reason to not have an iterator object. For example, if you're going through a sequence of fixnums or a cons-based list, a stateful approach creates allocation.
I get why you might shy away from something like the Racket sequence
protocol, but its functional approach and use of multiple return values are very much aimed at performance. There way be ways to simplify that protocol and support somewhat fewer interactions with for
; I'm not sure. But something that stays closer to that protocol as a starting point (perhaps with more convenient layers built on top) would more certainly handle the kinds of iteration patterns and performance that we'll want.
I think there are some tricky tradeoffs with allocation-free iteration. You can see this with the hash iteration protocol, which accomplishes it by managing to squeeze the iteration state into a single fixnum, but that requires considerable implementation complexity (eg bnode-entry-at-position
) as well as not necessarily working well for every data structure and potentially being less performant than a cursor that could store the pointer to the relevant bucket more directly. On the other side, as you say, an imperative iterator object for iterating through fixnums will be quite expensive, especially with the current set of optimizations that Racket and Chez Scheme provide.
Part of the question will come down to what we imagine the common case to be. If we're mostly iterating through fixnums, cons lists, and Racket-style vectors, then without big optimization changes the current functional approach will likely be significantly better for performance. But if most Rhombus data structures are more complex (eg automatically-resized vectors, hash tables, etc), or are less likely to be built-in, or are more likely to be thread-safe from the perspective of hardware threads (often requiring more indirections), or are more likely to be accessed through generic interfaces then potentially other approaches will be more performant.