Fable
Fable copied to clipboard
Multidimensional array support
There was a request to support multidimensional arrays in Fable. At first I thought this was coming from the BCL but it's actually part of FSharp.Core and F# has some syntax support for multidimensional arrays (indexing) so probably it'd be nice to support them for "completeness". They are not usually needed in frontend apps, that's why most people haven't missed them, but they can make sense for Python and Rust.
At first I was hoping to "borrow" the implementation from FSharp.Core but then I realized that it just emits IL in crucial places like creation and indexing: https://github.com/dotnet/fsharp/blob/a7b668136cc720e50b92b0cae0f06081c5df1117/src/FSharp.Core/array2.fs#L47-L50
So we need to decide how we will represent multidimensional arrays by ourselves. Of course this can be different per language if it improves performance. But it'd be nice to have a common implementation that we can reuse. At first I was thinking of using arrays of arrays, but now I believe it's probably better to use a single flattened array as F# multidimensional arrays cannot be sparse. This will have benefits in JS for numeric arrays and I assume it's more or less how multidimensional arrays in NumPy work too.
Thoughts? @ncave @dbrattli
@alfonsogarciacaro From glancing at the code some time back, the IL emitted is just simple array element get/set, so we can probably reuse most of it. Either way would work, but yes, if we stick with flattened array implementation we can probably have single implementation across languages that have typed arrays like JS.
@alfonsogarciacaro Arrays are a huge mess in Python except for NumPy, but I'm reluctant to take a direct dependency on a massive library like NumPy even if it would solve many problems (most non data science use-cases would not be happy with such a dependency). For Python there's the builtin array module that no-one uses and only handles numeric values. There's the bytearrays which is what you would normally use for interop with e.g networking libraries that requires the immutable bytes counterpart. Other types would need to be transpiled to lists as fallback. Then there's the generic type mess to handle as well when you don1t really know what you will receive. NumPy could be a compile time option you could enable but it should not be required by default imo.
For some time I had the plan to take a dependency on Expression (again) since I have added a typed array type there that hides some of the mess and different backing stores, but I have not been able to make a decision yet to use it for Fable or not: https://github.com/cognitedata/Expression/blob/main/expression/collections/array.py
There's also the problem of List.fs using Array.fs as backing for many of the operators. This is most likely a significant performance penalty for Python (vs other languages). Lists should probably be compiled to Python tuples instead which are backed by fast c-implementation.
PS: all of this is what blocked me back in February (and I have still not been able to make progress) 😬
@dbrattli I would recommend not spending too much time trying to optimize lists. @ncave and I have already tried it several times and have failed 😉 I know it looks like a very good opportunity for performance since most F# devs use it as the default collection (syntax encourages that) but when we try to back the list with an array we improve the performance the cases of lists used as arrays, but decrease the performance of lists used as stacks (as lists should be used), which is not good. I think the only "hack" we have atm is making the tail mutable internally for faster reverse operations: https://github.com/fable-compiler/Fable/blob/c905697798185849f1cfe96cc36c2066dfee8380/src/fable-library/List.fs#L13-L18
It'd be nice to have a compiler option to use numpy for the arrays as this seems to be the standard for data science so it would make it easier to interact with existing Python scripts. But you're right it should probably not be the default.
@alfonsogarciacaro
I would recommend not spending too much time trying to optimize lists.
I think @dbrattli just meant to say that the List module method implementations that depend on the Array module may be inefficient if the Array module implementation is inefficient.
Ah, sorry, my bad! 😅