Fable icon indicating copy to clipboard operation
Fable copied to clipboard

Multidimensional array support

Open alfonsogarciacaro opened this issue 3 years ago • 5 comments

There was a request to support multidimensional arrays in Fable. At first I thought this was coming from the BCL but it's actually part of FSharp.Core and F# has some syntax support for multidimensional arrays (indexing) so probably it'd be nice to support them for "completeness". They are not usually needed in frontend apps, that's why most people haven't missed them, but they can make sense for Python and Rust.

At first I was hoping to "borrow" the implementation from FSharp.Core but then I realized that it just emits IL in crucial places like creation and indexing: https://github.com/dotnet/fsharp/blob/a7b668136cc720e50b92b0cae0f06081c5df1117/src/FSharp.Core/array2.fs#L47-L50

So we need to decide how we will represent multidimensional arrays by ourselves. Of course this can be different per language if it improves performance. But it'd be nice to have a common implementation that we can reuse. At first I was thinking of using arrays of arrays, but now I believe it's probably better to use a single flattened array as F# multidimensional arrays cannot be sparse. This will have benefits in JS for numeric arrays and I assume it's more or less how multidimensional arrays in NumPy work too.

Thoughts? @ncave @dbrattli

alfonsogarciacaro avatar Jul 22 '22 23:07 alfonsogarciacaro

@alfonsogarciacaro From glancing at the code some time back, the IL emitted is just simple array element get/set, so we can probably reuse most of it. Either way would work, but yes, if we stick with flattened array implementation we can probably have single implementation across languages that have typed arrays like JS.

ncave avatar Jul 23 '22 00:07 ncave

@alfonsogarciacaro Arrays are a huge mess in Python except for NumPy, but I'm reluctant to take a direct dependency on a massive library like NumPy even if it would solve many problems (most non data science use-cases would not be happy with such a dependency). For Python there's the builtin array module that no-one uses and only handles numeric values. There's the bytearrays which is what you would normally use for interop with e.g networking libraries that requires the immutable bytes counterpart. Other types would need to be transpiled to lists as fallback. Then there's the generic type mess to handle as well when you don1t really know what you will receive. NumPy could be a compile time option you could enable but it should not be required by default imo.

For some time I had the plan to take a dependency on Expression (again) since I have added a typed array type there that hides some of the mess and different backing stores, but I have not been able to make a decision yet to use it for Fable or not: https://github.com/cognitedata/Expression/blob/main/expression/collections/array.py

There's also the problem of List.fs using Array.fs as backing for many of the operators. This is most likely a significant performance penalty for Python (vs other languages). Lists should probably be compiled to Python tuples instead which are backed by fast c-implementation.

PS: all of this is what blocked me back in February (and I have still not been able to make progress) 😬

dbrattli avatar Jul 23 '22 06:07 dbrattli

@dbrattli I would recommend not spending too much time trying to optimize lists. @ncave and I have already tried it several times and have failed 😉 I know it looks like a very good opportunity for performance since most F# devs use it as the default collection (syntax encourages that) but when we try to back the list with an array we improve the performance the cases of lists used as arrays, but decrease the performance of lists used as stacks (as lists should be used), which is not good. I think the only "hack" we have atm is making the tail mutable internally for faster reverse operations: https://github.com/fable-compiler/Fable/blob/c905697798185849f1cfe96cc36c2066dfee8380/src/fable-library/List.fs#L13-L18

It'd be nice to have a compiler option to use numpy for the arrays as this seems to be the standard for data science so it would make it easier to interact with existing Python scripts. But you're right it should probably not be the default.

alfonsogarciacaro avatar Jul 23 '22 06:07 alfonsogarciacaro

@alfonsogarciacaro

I would recommend not spending too much time trying to optimize lists.

I think @dbrattli just meant to say that the List module method implementations that depend on the Array module may be inefficient if the Array module implementation is inefficient.

ncave avatar Jul 23 '22 08:07 ncave

Ah, sorry, my bad! 😅

alfonsogarciacaro avatar Jul 25 '22 07:07 alfonsogarciacaro