ndarray
ndarray copied to clipboard
Plan for dimension types
I've been thinking about the dimension types for a while. Two things I don't like about the current implementation are that:
- Strides are represented as
usizeand have to be converted toisizeevery time they're used. This is error-prone and confusing. - It seems a little weird that the dimension type itself (instead of an associated type) has a representation. For example, "2-D" doesn't necessary imply a specific representation to me.
What I'd like to do is something like the following:
pub trait Dimension
where
for<'a> Into<Self::OwnedUsize> for &'a Self::BorrowedUsize,
for<'a> Into<Self::OwnedIsize> for &'a Self::BorrowedIsize,
{
type OwnedUsize: AsRef<Self::BorrowedUsize> + AsMut<Self::BorrowedUsize> + AsRef<[usize]> + AsMut<[usize]>;
type BorrowedUsize: ?Sized + AsRef<[usize]> + AsMut<[usize]>;
type OwnedIsize: AsRef<Self::BorrowedIsize> + AsMut<Self::BorrowedIsize> + AsRef<[isize]> + AsMut<[isize]>;
type BorrowedIsize: ?Sized + AsRef<[isize]> + AsMut<[isize]>;
}
pub struct Ix2;
pub struct IxDyn;
impl Dimension for Ix2 {
type OwnedUsize = [usize; 2];
type BorrowedUsize = [usize; 2];
type OwnedIsize = [isize; 2];
type BorrowedIsize = [isize; 2];
}
impl Dimension for IxDyn {
type OwnedUsize = IxDynImpl<usize>;
type BorrowedUsize = [usize];
type OwnedIsize = IxDynImpl<isize>;
type BorrowedIsize = [isize];
}
pub trait IntoDimOwnedUsize {
type Dim: Dimension;
fn into_dim_owned_usize(self) -> Self::Dim::OwnedUsize;
}
pub trait AsDimBorrowedUsize {
type Dim: Dimension;
fn as_dim_borrowed_usize(&self) -> &Self::Dim::BorrowedUsize;
}
pub trait IntoDimOwnedIsize { ... }
pub trait AsDimBorrowedIsize { ... }
pub struct ArrayBase<S, D>
where
S: Data,
{
data: S,
ptr: *mut S::Elem,
dim: D::OwnedUsize,
strides: D::OwnedIsize,
}
impl<A, S, D> ArrayBase<S, D>
where
S: Data<Elem = A>,
D: Dimension,
{
pub fn shape(&self) -> &D::BorrowedUsize {
// ...
}
pub fn strides(&self) -> &D::BorrowedIsize {
// ...
}
}
Once Rust has generic associated types, we can simplify this to:
pub trait Dimension
where
for<'a, T: Clone> Into<Self::Owned<T>> for &'a Self::Borrowed,
{
type Owned<T>: AsRef<Self::Borrowed<T>> + AsMut<Self::Borrowed<T>> + AsRef<[T]> + AsMut<[T]>;
type Borrowed<T>>: ?Sized + AsRef<[T]> + AsMut<[T]>;
}
pub struct Ix2;
pub struct IxDyn;
impl Dimension for Ix2 {
type Owned<T> = [T; 2];
type Borrowed<T> = [T; 2];
}
impl Dimension for IxDyn {
type Owned<T> = IxDynImpl<T>;
type Borrowed<T> = [T];
}
pub trait IntoDimOwned<T> {
type Dim: Dimension;
fn into_dim_owned(self) -> Self::Dim::Owned<T>;
}
pub trait AsDimBorrowed<T> {
type Dim: Dimension;
fn as_dim_borrowed(&self) -> &Self::Dim::Borrowed<T>;
}
pub struct ArrayBase<S, D>
where
S: Data,
{
data: S,
ptr: *mut S::Elem,
dim: D::Owned<usize>,
strides: D::Owned<isize>,
}
impl<A, S, D> ArrayBase<S, D>
where
S: Data<Elem = A>,
D: Dimension,
{
pub fn shape(&self) -> &D::Borrowed<usize> {
// ...
}
pub fn strides(&self) -> &D::Borrowed<isize> {
// ...
}
}
I'd also add various type-level arithmetic operations on the dimension types, which are necessary for things like co-broadcasting (trait PartialOrdDim) and fold_axes (trait SubDim).
We can also add Shape<T>, Strides<T>, and Index<T> thin wrapper types.
This approach would resolve things like #489 and this comment on #367.
Thoughts?
It looks cleaner, which is always good API-wise, but I have to admit that the whole dimension-shape situation is a little big foggy to me (especially since reading #367, I feel there is a lot I am missing there between the lines).
To get a proper understanding of what this implies could you frame it with respect to the other dimension-related traits?
isize vs usize genericity is a low level problem. It would be nice to solve. Before those come other problems I think: possible division of a trait for indices vs a trait dimensions (current solution uses the same for both which is pragmatic but limiting), and like you point out, making the dimension-related traits more obvious and hopefully fewer in number.
I was rereading https://github.com/rust-ndarray/ndarray/issues/367 and I think, even though there might some technical difficulties that will have to be ironed out, that what @termoshtt was proposing makes intuitive sense from an API point of view.
I would actually push it one step further, by further decomposing what we know call Dimension into two things:
- a
NdimorDimensionalitytrait, used to check at compile time how many dimensions an array has. We could resort to something similar to what nalgebra does using typenum; - a
Shapetrait, as inNumPy, providing information on how many elements there are in each axis of the array.
Ndim would be the only type, apart from element type, to appear in the ArrayBase declaration, while Shape will be inferred from the data passed to the constructor.
A MemoryLayout would then take the place of our Strides-related existing functionalities.
From a user point of view, I think this would result more "intuitive". What are your thoughts?
Yeah, I think it makes sense to separate the type that implements Ndim/Dimensionality from the representation of the shape/strides. Until Rust has generic associated types, I'm thinking about a simpler approach than I proposed earlier:
use num_traits::Zero;
use smallvec::SmallVec;
// Replaces the "number of dimensions" portion of `Dimension`
pub trait Dimensionality {
type Shape: DimRepr<Elem = usize>;
type Strides: DimRepr<Elem = isize>;
const NDIM: Option<usize>;
}
pub struct D1;
pub struct D2;
pub struct D3;
pub struct D4;
pub struct D5;
pub struct D6;
pub struct Ddyn;
impl Dimensionality for D2 {
type Shape = [usize; 2];
type Strides = [isize; 2];
const NDIM: Option<usize> = Some(2);
}
impl Dimensionality for Ddyn {
type Shape = DynRepr<usize>;
type Strides = DynRepr<isize>;
const NDIM: Option<usize> = None;
}
// Replaces the "representation" portion of `Dimension`
pub trait DimRepr {
type Elem: Clone;
type Dim: Dimensionality;
fn ndim(&self) -> usize;
fn from_slice(slice: &[Self::Elem]) -> Self;
fn as_slice(&self) -> &[Self::Elem];
fn as_mut_slice(&mut self) -> &mut [Self::Elem];
}
impl<T: Clone> DimRepr for [T; 2] {
type Elem = T;
type Dim = D2;
fn ndim(&self) -> usize { 2 }
fn from_slice(slice: &[T]) -> Self {
assert_eq!(slice.len(), 2);
[slice[0].clone(), slice[1].clone()]
}
fn as_slice(&self) -> &[T] { &self[..] }
fn as_mut_slice(&mut self) -> &mut [T] { &mut self[..] }
}
pub struct DynRepr<T>(SmallVec<[T; 4]>);
impl<T: Clone> DimRepr for DynRepr<T> {
type Elem = T;
type Dim = Ddyn;
fn ndim(&self) -> usize { self.0.len() }
fn from_slice(slice: &[T]) -> Self { DynRepr(slice.into()) }
fn as_slice(&self) -> &[T] { &self.0 }
fn as_mut_slice(&mut self) -> &mut [T] { &mut self.0 }
}
pub struct ArrayBase<S, D>
where
S: Data,
D: Dimensionality,
{
data: S,
ptr: *mut S::Elem,
shape: D::Shape,
strides: D::Strides,
}
impl<A, S, D> ArrayBase<S, D>
where
S: Data<Elem = A>,
D: Dimensionality,
{
// Equivalent to old zeros method. The only real difference is that
// `ShapeBuilder` has been renamed to `LayoutBuilder`.
pub fn zeros<L>(layout: L) -> Self
where
A: Clone + Zero,
S: DataOwned,
L: LayoutBuilder<Dim = D>,
{
unimplemented!()
}
/// Errors if [...] any of the strides are negative.
pub fn from_shape_vec<L>(layout: L, v: Vec<A>) -> Result<Self, ShapeError>
where
L: Into<StridedLayout<D>>,
{
unimplemented!()
}
pub fn into_shape<E>(self, shape: E) -> Result<ArrayBase<S, E::Dim>, ShapeError>
where
E: IntoShape,
{
unimplemented!()
}
pub fn shape(&self) -> &D::Shape {
&self.shape
}
pub fn strides(&self) -> &D::Strides {
&self.strides
}
}
// Replaces the old `IntoDimension` trait
pub trait IntoShape {
type Dim: Dimensionality;
fn into_shape(self) -> <Self::Dim as Dimensionality>::Shape;
}
pub trait IntoStrides {
type Dim: Dimensionality;
fn into_strides(self) -> <Self::Dim as Dimensionality>::Strides;
}
// Equivalent to the old `Shape` struct
pub struct ContigLayout<D: Dimensionality> {
shape: D::Shape,
is_c: bool,
}
// Equivalent to the old `StrideShape` struct
pub struct StridedLayout<D: Dimensionality> {
shape: D::Shape,
strides: D::Strides,
is_custom: bool,
}
// Nearly equivalent to the old `ShapeBuilder` trait
pub trait LayoutBuilder {
type Dim: Dimensionality;
fn into_contig_layout(self) -> ContigLayout<Self::Dim>;
fn f(self) -> ContigLayout<Self::Dim>;
fn set_f(self, is_f: bool) -> ContigLayout<Self::Dim>;
fn strides<St>(self, strides: St) -> StridedLayout<Self::Dim>
where
St: IntoStrides<Dim = Self::Dim>;
}
This is similar to the existing implementation; the primary changes are:
- The type representing the number of dimensions (the type that implements
Dimensionality) is separate from the representation of the shape/strides (the types that implementDimRepr). - Strides are now always handled as
isizeinstead of casting to/fromusize.
Everything else about the existing implementation stays pretty much the same, other than renaming some things.
I've chosen to expose the representation of fixed-dimension shape/strides as fixed-length arrays, but that's not strictly necessary.
The remaining things I dislike about this are:
- The name
IntoShapecould be confusing for things other than shapes. (See.permuted_axes()for an example of this. IMO the nameIntoDimensionalityin this case is just as confusing asIntoShape. It would be nice to have anIntoAxestrait or something.) - It would be nice to be able go from
D: Dimensionalityto the corresponding type that implementsDimRepr<Elem = T>for arbitraryT, not justusize/isize. With currentndarrayand with this proposal, if you want to represent a collection ofndimitems that aren'tusize/isizegiven only theD: Dimensionality, you have to use aVec.
I was going to say first that in general:
- typenum is out unless we can show that it is easy to use for ndarray users
- we need to be ready for const generics, which could arrive in 2019
- Solution needs to state its goals and tradeoffs
- Presenting a simple interface to users is a good goal. I think this should be the starting point.
- More exact associated types seems to be good to have for advanced users, but in a way even detrimental to the goal just above
I like your new approach jturner. It's good but doesn't seem complete, unsure if it solves enough problems.
I'll ask about a concrete problem in Array::zeros
fn zeros<Sh>(new_shape: Sh) where Sh: ShapeBuilder<Dim=D>
This would be better with a bound like Sh: IntoContiguousShape<Dim=D>, or anything that better evokes exactly what kind of argument is needed here.
For array constructors we have three kinds of inputs:
- Tuple-ish: Layout description of axes
- Shape: Layout description of axes and C/F order
- StridedShape: Layout description of axes and custom strides
For natural reasons we only allow (1) and (2) for most constructors, and (3) for some advanced constructors, and this is the reason we use the type system to only accept (3) where we can support it. Current bounds accomplish this, but the traits and trait names don't make it very clear.
For me, "Layout" means something like C/F memory layout, it doesn't mean the lengths of each axis, so I'm unsure about the name.
I like @jturner314's design - wouldn't it be better though to use two separate traits for Shapes and Strides? (even if they might offer exactly the same methods initially)
Currently we are constraining both of them using DimRepr, but with a Shape trait and a Stride trait we would retain more flexibility and it would probably allow us to implement more specific functionality for each one of them separately, without causing confusion to the end user.
Having worked with numpy and also some Eigen3 in C++ I was wondering if there are plans to support having the size (of some dimensions) as part of the type. If I recall correctly it was/is possible in Eigen3 to have a type that has fixed (compile time) size in some dimensions and dynamic size in others.
Something along with let a : Array<f32, Dim<[IxDyn, 5, IxDyn, 2]> = .... I think Eigen3 used isize for the shapes and the special value -1 to indicate a dynamic dimension in the types.
Edit: AFAI this is not (yet) possible with rust, as it would either require variadic const generics or const generics to support more than usize,bool andchar values. -- But one can of course work around the missing variadics by handcoding up to a reasonal number: struct Dim2<const S1:usize, const S2:usize> ...