[Discussion] Immutable defaults for mutable JS API:s
Following the discussion about immutable defaults in https://forum.rescript-lang.org/t/ann-rescript-core-a-new-batteries-included-drop-in-standard-library/4149, let's discuss which API:s are problematic and what we can do to remedy that.
First off, there's a few things we need to keep in mind in this discussion:
- A goal is to stay close to the JS API:s (deliberately not saying "as close as possible", but close). This means we have wiggle room to brush over inconsistencies and "improve" the JS API:s we bind to so they make more sense for idiomatic ReScript usage, where it makes sense. But a complete overhaul of API:s isn't something we'll pursue for
Core. Not everyone is going to agree with this, and that's fine. But that's the way forward forCore. - For full immutability,
Beltis still a great alternative, andBeltisn't going anywhere, it'll keep being around.
With that said, providing immutable defaults where it makes sense is definitively a good idea. So, let's detail what API:s are problematic right now. Please write down the API:s with mutability you worry about, and we'll discuss what we can do about it, if anything.
Let's also discuss what we can do about the mutable API:s to make it apparent that they're mutable. A few things come to mind:
- Return
unitalways, even if the underlying JS API returns itself mutated. One example for this isMap.setwhich mutates and returns itself. This would returnunitinstead to signify that it's mutating. - Naming. This is already the case, suffixing functions mutating the original thing with
inPlace. We can check that that's consistent. - Doc strings. Have a bolded
**This mutates the original array**on top of each doc string, plus suggesting the immutable alternative. This will make it very apparent that you're using a mutable API as you're working in your editor.
Also, there's https://github.com/tc39/proposal-change-array-by-copy. The question is how far we should go from the JS API
From perspective of a ReScript module API, making dangerous functions more visible is good, but I’m not sure if it’s good enough. When a module Foo has no mutating methods, you have a strong guarantee that Foo.t is immutable, and the invariant new value == new ref is enforced. Which have consequences for pattens that take advantage of this invariant, e.g. the React hooks with dependencies arrays.
But when you simply discourage mutating, say, arrays, instead of completely ruling it out, you don’t have those guarantees. In a team of app developers that have other things on their mind than just remembering not to mutate things, you’d probably at least want a linter to strongly discourage calling *inPlace functions. And I think a totally immutable Array module (with a way to copy-and-convert from/to a mutable array) is safer than a linter.
That said, it’s rather weird to omit bindings when binding to Array.prototype, and a totally immutable Array module might actually belong in Belt.
Return unit always, even if the underlying JS API returns itself mutated. One example for this is Map.set which mutates and returns itself. This would return unit instead to signify that it's mutating.
I’m torn on this. I like the idea of clearly separating getters and setters, but then bindings like this are not true to the underlying API, which might be too opinionated for a standard JS lib, and I bet some people are going to miss the chained calls those return values enable.
A goal is to stay close to the JS API:s
I totally agree going on this direction, for me it's more important to be able to use the real JS api, and be consistent with the real JS api, than having some pure immutable layer, and for several reasons including performance and porting existing JS code.
Javascript it's mixed, in the sense that has both mutable and immutable APIs, in the case of Array, for example one cannot expect expect to be close to JS without mutability.
Return unit always, even if the underlying JS API returns itself mutated. One example for this is Map.set which mutates and returns itself. This would return unit instead to signify that it's mutating.
This makes sense even though there are some benefits like chaining, it's clearer to leave mutable operations as unit
Naming. This is already the case, suffixing functions mutating the original thing with inPlace. We can check that that's consistent.
I would prefer to be consistent with JS api if one really cares about the JS ecosystem, it may be consistent with rescript ecosystem, but not with JS. Being consistent with JS apis will also allow to have better tools for generating rescript bindings from typescript/javascript
Fo example Scala.js has a set of both mutable and immutable collections, while also keeping the JS.Array for interop, and their JS api the closest to native JS, because they have plenty of their own Scala standard library without having to sacrifice any JS api consistency,
which in turns enables Scala.js code converters like ScalablyTyped to use most of the Typescript ecosystem, with automatically generated bindings.
That said, it’s rather weird to omit bindings when binding to
Array.prototype, and a totally immutableArraymodule might actually belong inBelt.
Yeah this pretty much sums it up well I think. We're trying to stay close to the JS array API, which is ultimately what we're binding to, not just as a backing structure. An immutable array module is of course very welcome in Belt, or as something standalone.
I haven't really made up my mind yet, but I want to share where I'm coming from and what my thoughts are to facilitate further discussion:
My hope / expectations for a new rescript standard library has been a "full-fledged belt-like" lib embracing functional idioms.
I understand this is a non-goal of Core and absolutely see the necessity to have all JS bindings in one place and well documented.
Regarding immutability:
I don't have an issue per se with immutable APIs being present in Core.
I guess for me, it's all about positioning Core and setting expectations:
- Core are mainly bindings to vanilla js?
- But not exclusively?
- What's the maximum desired deviation from vanilla JS bindings?
- What's the guideline for deciding what to include and what's too far off?
- If Core were just bindings, the distinction on what to include here and in another "higher order" lib would be obvious. Including parts of Belt and other handy functions (but not bindings) blurs the line further.
- It would be technically possible to provide two distinct api surfaces: mutable / immutable - but this would unnecessarily split up the js bindings, therefore I don't like this approach. Duplicating APIs is even worse in my opinion.
- I'm also thinking about what would be the next step forward for a "functional immutable standard lib"? Would it make sense to vendor Core or use it as a dependency?
I'm aware this is becoming slightly off-topic, but I think the implications are pretty relevant to the discussion.
@woeps I'm guessing you saw this https://github.com/rescript-association/rescript-core#guiding-principles?
hoichi has a good point ‘When a module Foo has no mutating methods, you have a strong guarantee that Foo.t is immutable’
What about keeping the api very close to the original js (with a few small tweaks to remove warts etc.. eg returning unit for mutation) and separate immutable datatypes eg as per tc39 proposal. https://github.com/tc39/proposal-record-tuple
Yes, that's the best way forward here. Array binds to JS array, which is mutable, the same way that Set binds to the mutable JS Set. For immutable sets, we have Belt.Set. For immutable arrays, if there's demand for it and we have contributors willing to put in the work to build it, we could have either Core.ImmutableArray or Belt.ImmutableArray, depending on where we want to draw the line for what is included where.
I would be down to try and implement this.
I have some questions I would like feedback on first.
Where does this go?
Should it be in Belt or Core?
My thoughts are we could have Core.ImmutableArray that has most of the same functions as Core.Array but it just doesn't mutate things and returns a copy of the original array. That would leave room for us to do something like Belt.List for a more custom data type. Same thing for Core.ImmutableObject and something like Belt.Record. I took names from Immutable.js for these and was thinking of using that as a starting point? That could come later so it doesn't need to be decided now. We could also wait for Tuple and Record to be added to JS.
What should ImmutableArray and ImmutableObject do?
Should ImmutableArray only have immutable versions of mutable array functions like set? Or should it have most of the same functionality as Array? Should it follow a different design and be like an Immutable.js List?
Should Array and ImmutableArray be compatible?
I don't think you should be able to swap functions on these two and that they need distinct types. We could add functions to convert from one to the other like ImmutableArray.fromArray and Array.fromImmutableArray that would allow you to swap if needed.
How do you make an ImmutableArray?
If we want to do this without adding any syntax to the language I think we can use ImmutableArray.of([1,2,3]) to create one.
Do we even want Core.ImmutableArray or should we do Belt.List?
Rather than duplicating the API for Core.Array we could make a totally new thing that is simpler with just a few functions like set, get, filter, reverse, etc...
Technically, it's possible to have a single set of bindings that map closely to the JS API, and also have a subset of them work on a distinct immutable array type, by adding a phantom type argument.
// Instead of
type array<'a>
// we add a phantom type argument to specify mutability
type array<'mut, 'a>
// then a couple of abstract phantom types to put there
type mut
type imm
// and now you can define functions that work on every kind of array, or just one kind
@get_index external get: (array<_, 'a>, int) => option<'a> = "" // can be called on any kind of array
@set_index external set: (array<mut, 'a>, int, 'a) => unit = "" // can only be called on mutable arrays
Not saying this is a good idea, there are certainly significant downsides to it, like making the types and type error messages more complicated, but it does make for a smaller API surface and is quite intuitive I think.
Wouldn't that mean that while my call of set wouldn't mutate the value, but another call of that function could mutate it? I think that's why I like the idea of types. But in theory we could some up with a way to share the functions that are duplicated on each? Maybe some type of InternalArray that we don't export and use in Array and ImmutableArray.
Not sure what you mean. set mutates. On every call. But it can only be called on arrays that are tagged as mutable. A set function that makes a copy instead of mutating would have to live elsewhere, but that also goes with the idea of having the Array module stick close to the JS array API. You could instead have an Array.Immutable module with immutable versions of the mutable bindings though.
Sorry I wasn't clear.
If you have an Array.set function that mutates an array<'a> you wouldn't want to be able to pass it to an Array.Immutable.set function.
I think there should be a clear distinction between an array and an immutable array.
Technically, it's possible to have a single set of bindings that map closely to the JS API, and also have a subset of them work on a distinct immutable array type, by adding a phantom type argument.
// Instead of type array<'a> // we add a phantom type argument to specify mutability type array<'mut, 'a> // then a couple of abstract phantom types to put there type mut type imm // and now you can define functions that work on every kind of array, or just one kind @get_index external get: (array<_, 'a>, int) => option<'a> = "" // can be called on any kind of array @set_index external set: (array<mut, 'a>, int, 'a) => unit = "" // can only be called on mutable arraysNot saying this is a good idea, there are certainly significant downsides to it, like making the types and type error messages more complicated, but it does make for a smaller API surface and is quite intuitive I think.
The issue with that is going to be variance. Immutable arrays are covariant, and mutable ones are invariant (no subtyping at all). I don't think the language lets you have both in one definition.
As for where this should live, there was talks a while ago about a separate package dedicated to ReScript data structures, where this and several other things would fit well. Might be good to start with that, and then we can decide on where it should live more long term when it's tried and tested.
If you have an
Array.setfunction that mutates anarray<'a>you wouldn't want to be able to pass it to anArray.Immutable.setfunction.I think there should be a clear distinction between an array and an immutable array.
That is is exactly what I'm proposing. There would be no array<'a>, but instead array<mut, 'a> and array<imm, 'a>. Array.set would take only the former, Array.Immutable.set would take only the latter, and Array.get would take either.
The issue with that is going to be variance. Immutable arrays are covariant, and mutable ones are invariant (no subtyping at all). I don't think the language lets you have both in one definition.
Can you illustrate the issue with an example? If I understand correctly, covariance would just mean that if 'a is a subtype of 'b, then array<'a> is also a subtype of array<'b>, while if it's invariant that relation wouldn't transfer? And in practice just means you'd have to map it?
The issue with that is going to be variance. Immutable arrays are covariant, and mutable ones are invariant (no subtyping at all). I don't think the language lets you have both in one definition.
Can you illustrate the issue with an example? If I understand correctly, covariance would just mean that if
'ais a subtype of'b, thenarray<'a>is also a subtype ofarray<'b>, while if it's invariant that relation wouldn't transfer? And in practice just means you'd have tomapit?
Yes that's what it means. Combining both kinds of arrays in one type would give up the opportunity to make immutable ones covariant. This has come up a number of times, an it seems that having covariance is desirable.
As for where this should live, there was talks a while ago about a separate package dedicated to ReScript data structures, where this and several other things would fit well. Might be good to start with that, and then we can decide on where it should live more long term when it's tried and tested.
Would it be a good idea for me to start a repo and publish some type of alpha release for this? I could do @jvlk/rescript-immutable-data?
~~We'll need a way to convert a regular array into an immutable array. How would I go about that? The types from the input don't let me just take in an array and return it as an immutableArray.~~
Nevermind, I just needed a .resi file.
~~We'll need a way to convert a regular array into an immutable array. How would I go about that? The types from the input don't let me just take in an
arrayand return it as animmutableArray.~~Nevermind, I just needed a
.resifile.
You can't convert a mutable array to an immutable one without making a copy. It's unsafe.
Same for the other way round.
The issue with that is going to be variance. Immutable arrays are covariant, and mutable ones are invariant (no subtyping at all). I don't think the language lets you have both in one definition.
Can you illustrate the issue with an example? If I understand correctly, covariance would just mean that if
'ais a subtype of'b, thenarray<'a>is also a subtype ofarray<'b>, while if it's invariant that relation wouldn't transfer? And in practice just means you'd have tomapit?
I did not really answer your question: why it matters. And empty mutable array is tricky with typing. Just like references. An immutable one is not.
That's just one example.
Another one: how comes I can't type coerce array of this record to array of that record? (Question people asked). Because arrays are not covariant.
Ok, I based it off from the work done here in this thread: https://forum.rescript-lang.org/t/immutable-array-implementation/761
module Array: {
/* Put this in the interface file */
type t<'a>
let make: array<'a> => t<'a>
let isEmpty: t<'a> => bool
let head: t<'a> => option<'a>
let tail: t<'a> => t<'a>
let toArray: t<'a> => array<'a>
let map: (t<'a>, 'a => 'b) => t<'b>
let concat: (t<'a>, 'a) => t<'a>
let append: (t<'a>, 'a) => t<'a>
let get: (t<'a>, int) => option<'a>
let set: (t<'a>, int, 'a) => t<'a>
} = {
/* Put this in the module file */
type t<'a> = array<'a>
let make = arr => arr->Array.copy
let isEmpty = t => t->Array.length == 0
let head = t => t->Array.get(0)
let tail = t => isEmpty(t) ? [] : t->Array.slice(~start = 1, ~end = t->Array.length)
let toArray = make
let map = Array.map
let concat = (t, a) => [a]->Array.concat(t)
let append = (t, a) => t->Array.concat([a])
let get = (t, i) => t->Array.get(i)
let set = (t, i, v) => {
let c = t->Array.copy
c->Array.set(i, v)
c
}
}
let t1 = [1,2,3]
let t2 = t1->Immutable.Array.make->Immutable.Array.head // no error
let t3 = t1->Immutable.Array.head // error
I'm going to clean this up and add more functions and some basic documentation.
The way I have it above it's not a zero cost binding. Trying to see if I can do that now.
The more I poke at this I don't think there is a way to have this pass directly to JS functions directly. There has to be a middle layer for these types to work and to not mutate the input values.
But that's something I might be able to improve on.
The more I poke at this I don't think there is a way to have this pass directly to JS functions directly. There has to be a middle layer for these types to work and to not mutate the input values.
But that's something I might be able to improve on.
On needs to decide what immutable means. To some people it means: don't mutate. To others it means: make a copy for every single operation. The former can be done with just omitting mutating functions. The latter cannot.
To me I would expect it not to mutate and to return a copy.
The first case of not mutating can be done just simply not using the functions that could be omitted.
I did not really answer your question: why it matters. And empty mutable array is tricky with typing. Just like references. An immutable one is not.
That's just one example.
Another one: how comes I can't type coerce array of this record to array of that record? (Question people asked). Because arrays are not covariant.
Thanks for the examples! These problems aren't going to go away by adding a separate second-class immutable array type though. As long as the default is a mutable array, and that is what's used by all sorts of bindings and such, people are still going to stumble across these issues and get confused. And at that point, are these issues severe enough that they'll choose to switch over to an immutable array, converting to and from mutable arrays where needed and adapt to different usage patterns?