ndarray icon indicating copy to clipboard operation
ndarray copied to clipboard

Interoperability between array references of different libraries

Open grothesque opened this issue 2 weeks ago • 4 comments

The recent addition of ArrayRef to ndarray explicitly unifies the in-memory representation of arrays in ndarray to be something like a tuple of (shape, strides, data ptr). This allows implementing Deref from all array types to the reference type. This in turn allows functions to accept array parameters as nothing more than a pointer to such a tuple, no matter whether the actual type is an owned array, a view, or something else. This avoids a lot of needlessly generic code and greatly simplifies APIs, just like the built-in slices of Rust do.

The next step of this harmonization would be realizing that to a large extent the different multi-dimensional array libraries for Rust work with a very similar model of what is an array. Perhaps a standard memory representation (i.e. order and data type for shape and strides) of array reference could be established that is shared across multiple crates?

  • The mdarray crate has a Slice type that is similar to ndarray’s ArrayRef supports arbitrary shapes and strides.
  • Faer has MatRef which is always two-dimensional, but supports arbitrary shapes and strides.
  • As far as I know nalgebra’s matrices are always densely packed in the first dimension, which prevents them being reinterpreted as ArrayRefs at this point, but an reinterpretation as mdarray slices might be possible.
  • There are more, but you get the point.

Anyway, without looking too much into the technical details at this moment, I would like to bring up the question whether there is interest in some degree of harmonization. I believe that this has the potential to greatly benefit the Rust numerical array “ecosystem”. If the array references became compatible, libraries could add features that enable Deref implementations from the array ref of an external library to the native one.

@sarah-quinones, @fre-hu, @akern40, what do you think?

grothesque avatar Dec 10 '25 10:12 grothesque

i would be happy to discuss what a shared interop api could look like if there's interest in having that discussion. @andlon would probably also be interested

sarah-quinones avatar Dec 10 '25 10:12 sarah-quinones

Definitely count me in!

akern40 avatar Dec 10 '25 13:12 akern40

I agree, good initiative!

One idea is to have something similar to AsRef/AsMut in std. For example to have AsArrayRef<T>/AsArrayMut<T> with methods for various things like data pointer and shape.

Having a common memory representation for Deref would of course be very good, but could also be difficult to achieve.

fre-hu avatar Dec 10 '25 17:12 fre-hu

Glad that you are interested!

Perhaps we can start with an overview of the array storage schemes supported/used by the different libraries. I’ll start with mdarray (@fre-hu please correct me if necessary), and with a few questions regarding the other libraries. I know that there exist even more libraries, but finding some kind of consensus will be already challenging with these four. Still, please feel free to invite/join.

mdarray

  • Number of dimensions: Can be a type-level constant if in the range 0..7. If it’s dynamic, it’s stored in memory as an usize value.
  • Shape: Each element of the shape is of type usize. If the number of dimensions is a type-level constant, each element of the shape can be either a compile-time constant, or dynamic and stored in memory. If the number of dimensions is dynamic, then all the elements of the shape are dynamic as well.
  • Strides: In the current version of the library two “layouts” are possible: Dense does not store anything in memory and implies row-major ordering of the axes without any gaps, and Strided is fully general and stores N isize values. Other layouts could be (re-)introduced, e.g. column-major. Question for @fre-hu: Would a fully general layout where each element can be a type-level constant be possible?

ndarray

Looking at struct ArrayParts, my impression is that the same types are used for shapes and strides and that they must implement the (Dimension trait). This leaves two questions open for me:

  • Is the number of dimensions stored twice for each array (ref), namely once in the shape, and once in the strides?
  • Are negative strides possible?

faer

nalgebra

(@sebcrozet, perhaps you are interested in this discussion as well?)

error[E0308]: mismatched types
   --> src/mut_/mod.rs:168:34
    |
167 |         nalgebra::base::DMatrixViewMut::from_slice_with_strides_generic(
    |         --------------------------------------------------------------- arguments to this function are incorrect
168 |             s, na_rows, na_cols, na_stride0, na_stride1,
    |                                  ^^^^^^^^^^ expected `Const<1>`, found `Dyn`
    |
    = note: expected struct `nalgebra::Const<1>`
               found struct `nalgebra::Dyn`

grothesque avatar Dec 10 '25 19:12 grothesque

@grothesque you've got it mostly right for ndarray's ArrayRef. However, there is an open RFC, #1506, that summarizes my thoughts so far on how we want to change how ndarray handles Dimension, etc. So as we talk about our common goals, I'd be ok with setting a "goal state" that ndarray can reach for that isn't necessarily immediately compatible with our current code.

Also, I agree with @fre-hu that a good first goal would be interop rather than common memory format. As far as I see it, there are essentially 3 "attributes" for interop that we'd want to capture:

  • Is the data mutable?
  • Are the shapes/strides mutable?
  • Does the interop transformation cause a data copy?

All told, that would make for 8 traits. Not terrible, but if we wanted to skip the shape/stride mutability then we could reduce it to 4.

akern40 avatar Dec 14 '25 21:12 akern40

I would be interested in this. My use case is that I maintain an ML inference runtime which internally uses its own tensor library but most users will be more used to working with ndarray. It would be useful to have a way to accept an array from ndarray or other libraries without needing a direct dependency on each library. Additionally it would be useful to have a way to convert owned arrays containing inference results back into the source array library type.

The representation I use internally for layouts is one of two types, abstracted by a Layout trait:

struct DynLayout {
  shape_and_strides: SmallVec<[usize; 8]>,
}

struct NdLayout<const N: usize> {
  shape: [usize; N],
  strides: [usize; N],
}

The backing storage I use is (ptr, len) for array views (note: not a slice) and Vec<T> for owned arrays. So this is very similar to ndarray, except for not supporting negative strides. For my use case it isn't critical that an interoperable representation has to be a reference, as opposed to a type more like ndarray's ArrayView, as creating fixed-rank types with their own shape/stride arrays is cheap.

robertknight avatar Dec 14 '25 22:12 robertknight