kotlinx.collections.immutable icon indicating copy to clipboard operation
kotlinx.collections.immutable copied to clipboard

Rename from immutable to persistent

Open FranciscoE-Hudl opened this issue 8 years ago • 13 comments

There is some immediate backlash about immutable collections by Java users, as their understanding of the concept is either getting a full copy per operation, or receiving an exception on mutating methods.

The persistent keyword along with some references to Big O perf in the documentation may help spread the understanding of the difference.

FranciscoE-Hudl avatar Jul 16 '16 22:07 FranciscoE-Hudl

I like this idea, but it looks like implementations of the ImmutableCollection interface don't have to be persistent. That being said, perhaps the concrete implementations should contain the word Persistent.

cbruegg avatar Jul 19 '16 16:07 cbruegg

In the Java world, "Persistent" has always meant "stored in a database". It's the "P" in "JPA", which is why I steered away from it. Either way, whatever words you use for immutable/persistent collections should be less typing than the name for Mutable ones (and Unmodifiable ones) in order to steer lazy programmers toward doing the safest thing.

GlenKPeterson avatar Jul 27 '16 14:07 GlenKPeterson

I think the best term is "functional". "Immutable" is too narrow, suggesting mere immutability; the Guava immutable collections, for example, have no operations to construct new versions of the collections. "Persistent" (besides having another common meaning) describes a fact about the performance profile of the collection, not its semantics: one could create a persistent mutable collection class with an operation that created and saved a snapshot (arguably this is what a ZFS or BTRFS filesystem is).

"Functional", to me, says both that the type has value semantics -- that it's not just immutable but has useful new-version operations -- and that it has a performance profile appropriate for functional programming. The latter requirement is certainly easiest to satisfy if the type is also persistent.

slburson avatar Aug 05 '16 00:08 slburson

I agree the use of the term immutable on the proposal is misleading. The behaviour suggests effectively immutable and leans more to Okasaki's definition of a persistent data structure. Having used Clojure, it would be great if this paradigm was also fundamentally available in Kotlin.

bigkahuna1uk avatar Feb 23 '17 19:02 bigkahuna1uk

For me, immutable is exactly what the name implies, something that cannot be changed. I also expect to get a new object upon every invocation of any method that might mutate it. Actually, I would expect those methods to be not available at all but that ship has sailed. I have to admit that my background is not so JVM centric, however, searching for immutable collections is what brought me here. Especially because there are already Map and MutableMap, the natural conclusion is to look for ImmutableMap (well, actually I expected that Map is immutable but alas).

Fleshgrinder avatar Feb 19 '18 19:02 Fleshgrinder

An observation: There are actually two common use cases here, not just one.

Firstly, some users want to construct an mutable collection object or builder, add a bunch of items to it, and then construct an immutable object from that. At the time of construction of the immutable object, these users would typically like the immutable object to be "optimally" constructed such that future read-only access to the immutable collection's contents will be as fast as possible, and there is no extra wasted space (using optimal page sizes for hashtables, optimally-sized arrays, special case subclasses for small or empty collections, etc.). This kind of optimization is possible because all items to be contained by the immutable object are supplied at construction time, and it is known that no more items will be added later (to that object, anyway). This is the model used by Guava's ImmutableList, ImmutableSet, etc. classes. (Consider ImmutableSet.Builder, ImmutableSet.copyOf, etc.) In this case, for instance, an immutable List subclass might typically be implemented internally as an array with a size exactly equal to the number of items to be held (O(1) access time, O(N) data space usage). A small immutable hashtable might be implemented with a custom-generated perfect hashing function and a hash array exactly equal to the number of elements. A tree might be perfectly balanced. And so forth.

Secondly, some other users want to construct an immutable "persistent" collection object incrementally, quickly and cheaply, with the intention that additional persistent collection objects will be constructed later to reuse parts of that object. This is the model typically used by functional languages (LISP, etc.). In this case, heavy optimization as items are added may actually be a negative, since ease and speed of creation of individual lightweight "parent" collection objects (that reference other "child" collection objects, etc.) will dominate. If you are going to be adding 50,000 items to a List that is essentially a recursive singly-linked list, you don't want to be trying to do an optimization pass after each item is added, and you are probably OK with additional space overhead being required for "next" references. In this case, for instance, an immutable List might typically be implemented internally as an immutable "head" object containing a reference to another List which contained the "tail" objects: Less memory efficient than an array, and less efficient to do random accesses on (O(N) instead of O(1) for an array), but still easy and quick (O(1)) to add elements to the beginning without having to reallocate/copy an entire array each time. Lots of other implementations (e.g. Ropes) are possible, of course.

My point is that these two cases and usage models are really quite different: In one case, you want to supply all items at construction time and do heavy optimization for read access up front, with the idea that adding new items will be very rare or nonexistent, and in the other case, you want to be able to add new items quickly and easily, with optimization for memory usage and read access probably less of a priority. The "Immutable" vs. "Persistent" models are not just issues of naming or the presence or absence of copy-and-modify functions: They are fundamentally different use cases, and I think that the Kotlin libraries will probably need to use different internal data structures in order to be written effectively for the two different use cases.

Sporking avatar Mar 13 '19 10:03 Sporking

@Sporking I partially agree with you. Ok, there are -at least- two use case, but the main difference is an implementation detail. Mutable list can have multiple implementations with different behaviours (ArrayList vs LinkedList). So I don't see different models, only different implementations for -as you suggested- different use case (ImmutableArrayList vs ImmutableLinkedList).

fvasco avatar Mar 13 '19 10:03 fvasco

Something to consider: Can you create an ImmutableList object with 1000 items, and then use the 'Collection.plus' function to create a PersistentList object which adds an additional item to the head of the list but delegates to the Immutable object for its tail (without having to copy the elements)? It seems to me that this should be possible, although the current API doesn't seem to allow it.

I would like to see the Collection.plus(item) extension function revised to construct an appropriately typed collection regardless of the actual subclass of List it was invoked on: If called on a MutableList, it would return a new MutableList (or ArrayList) starting with the new item, but if called on a PersistentList it would return a new PersistentList with the new item at the head and the old PersistentList as a tail, and if called on an ImmutableList it would return a new PersistentList with the new item at the head and the old ImmutableList as a tail. Iterable.plus could behave similarly.

I think that the copy-and-mutate functions like 'plus', 'minus', etc. could probably be placed (as extension functions) on the read-only interfaces like Collection, List, Iterable, etc. rather than being defined only in subclasses like PersistentList. This would allow calling code to not need to care what specific type of List was passed into a function: It could create modified copies of that list regardless. Subclasses could be more specific as to the types to be returned (via return type polymorphism): ImmutableList.plus could return an ImmutableList, for instance, and PersistentList.plus could return a PersistentList.

Sporking avatar Mar 13 '19 10:03 Sporking

@fvasco I agree with what you are saying with regards to different implementations. What I was describing above mostly relates to the interfaces of the lists (what operations are invokable on them, and what the return types of those operations would be) rather than to which specific subclasses of those interfaces exist.

With that said, I can see little use for a Java-style ImmutableLinkedList class since it is strictly worse than an ImmutableArrayList class, both in terms of access time and in terms of data size. The main reason to want to use a linked list class is to be able to perform O(1) inserts or deletions at the beginning or in the middle of the list, and in the case of an immutable list, these aren't possible anyway since the list is immutable.

Sporking avatar Mar 13 '19 10:03 Sporking

With regards to naming prefix, I would like to see "Hard", "Frozen", "Fixed", or other similar prefixes used rather than "Immutable" (which I think is rather long and hard to pronounce).

I also agree with @GlenKPeterson that "Persistent" in the Java world is pretty consistently used to mean "something which will survive the currently running program being terminated and restarted" (typically data structures saved via a disk file, a database connection, etc.) and I would prefer that the term not be redefined to mean something different here. I don't have a clear alternative, though. "Delegating" seems too long. "Recycle"? "Prepend" seems kind of attractive. (PrependList, PrependSet, etc.) although perhaps not as useful where the top-level item removes, rather than adds, items. "Prefix" has similar issues, but seems a bit better. "Linked" would be a possibility if it weren't already used for java.util.LinkedList. "Chained" or "Chain" perhaps?

Sporking avatar Mar 13 '19 11:03 Sporking

Another thing to consider is that one of the defining characteristics of Lisp-style functional data structures tends to be the ability to do an operation similar to 'tail()' in a very efficient fashion, to allow recursive algorithms using tail-call optimization. Lisp-style persistent data structures allow this as an O(1) operation with no extra allocations necessary. Array-based data structures, on the other hand, might need to allocate some kind of temporary proxy object to be returned to represent the tail, which takes time and thrashes the garbage collector a bit. So the choice of data structures can come down to: "Do I want O(1) random access but an expensive tail() call" (If so, use an ImmutableList or similar class), or "do I instead want an O(1) tail() call, but O(N) random access?" (if so, use a PersistentList or similar class).

Sporking avatar Mar 13 '19 11:03 Sporking

Could immutable collections that can share content (currently PersistentCollection) be just called SharedCollection?

mcpiroman avatar Jan 17 '22 18:01 mcpiroman

Could you please share your thoughts on why the shared moniker would be better than persistent? At least there is "Persistent data structures" term in use, whose properties are complied by PersistentCollection.

qurbonzoda avatar Jan 18 '22 16:01 qurbonzoda