rhombus-prototype icon indicating copy to clipboard operation
rhombus-prototype copied to clipboard

First pass at generic collection interfaces

Open jackfirth opened this issue 2 years ago • 4 comments

This is a rough draft of a generic collection API for Rhombus, as discussed in #201 and #221. This is nowhere near complete, and I'm only opening this draft pull request so that others can see what I've done so far and offer early feedback.

jackfirth avatar Dec 20 '22 00:12 jackfirth

Maybe this should be deferred to real-time discussion, but my immediate impression is that this is much too stateful. I think List should mean an immutable value, not something that is potentially a MutableList. The base iteration protocol should be functional, and not based on stateful iterator objects. To the degree state is involved, I worry that the interaction with thread-level concurrency needs to be clear from the start (i.e., the checks in this draft seem to detect a kind of concurrent modification, but not current in the sense of threads).

mflatt avatar Dec 20 '22 01:12 mflatt

I think List should mean an immutable value, not something that is potentially a MutableList.

My original plan was to have List, MutableList, and ListView instead of ImmutableList, MutableList, and List, but I forgot about that. I'll change the names around to reflect the original plan.

The base iteration protocol should be functional, and not based on stateful iterator objects.

I'd like iteration to not require allocation, since it's going to occur in tight loops very frequently. And I'd prefer it not to involve multiple return values, because that's a pretty clunky API to work with in my experience. Given those constraints, perhaps something like this is more in line with what you're imagining:

interface Sequence:
  method iterate() :: Iterator

interface Iterator:
  method isDone() :: Boolean
  method element() // throws if isDone()
  // might return the same iterator if iteration reuses a single mutable iterator object for performance reasons
  method next() :: Iterator // throws if isDone()

jackfirth avatar Dec 20 '22 06:12 jackfirth

Those seem like a good improvements!

Still...

I'd like iteration to not require allocation, since it's going to occur in tight loops very frequently.

That seems like a reason to not have an iterator object. For example, if you're going through a sequence of fixnums or a cons-based list, a stateful approach creates allocation.

I get why you might shy away from something like the Racket sequence protocol, but its functional approach and use of multiple return values are very much aimed at performance. There way be ways to simplify that protocol and support somewhat fewer interactions with for; I'm not sure. But something that stays closer to that protocol as a starting point (perhaps with more convenient layers built on top) would more certainly handle the kinds of iteration patterns and performance that we'll want.

mflatt avatar Dec 20 '22 06:12 mflatt

I think there are some tricky tradeoffs with allocation-free iteration. You can see this with the hash iteration protocol, which accomplishes it by managing to squeeze the iteration state into a single fixnum, but that requires considerable implementation complexity (eg bnode-entry-at-position) as well as not necessarily working well for every data structure and potentially being less performant than a cursor that could store the pointer to the relevant bucket more directly. On the other side, as you say, an imperative iterator object for iterating through fixnums will be quite expensive, especially with the current set of optimizations that Racket and Chez Scheme provide.

Part of the question will come down to what we imagine the common case to be. If we're mostly iterating through fixnums, cons lists, and Racket-style vectors, then without big optimization changes the current functional approach will likely be significantly better for performance. But if most Rhombus data structures are more complex (eg automatically-resized vectors, hash tables, etc), or are less likely to be built-in, or are more likely to be thread-safe from the perspective of hardware threads (often requiring more indirections), or are more likely to be accessed through generic interfaces then potentially other approaches will be more performant.

samth avatar Dec 20 '22 17:12 samth