ndarray icon indicating copy to clipboard operation
ndarray copied to clipboard

Implement non-axis windows and chunks iterators

Open Robbepop opened this issue 8 years ago • 54 comments

Already implemented features and todos:

  • [x] ExactChunks and ExactChunksMut aswell as their NdProducer impl
  • [ ] RaggedChunks or simply Chunks aswell as their NdProducer impl
  • [x] Windows
  • [x] NdProducer impl for Windows

Implement n-dimensional windows and chunks iterators.

Windows-iterators are especially useful for convolutional neural networks while chunk iterations may prove useful for paralell algorithms.

N-Dimensionality for both iterators use N-dimensionally shaped items.

For example for a 1-dimensional Array the windows and chunks items are also 1-dimensional. For a 2-dimensional Array the windows and chunks items are 2-dimensional, and so on.

For windows iterator it could be useful to support two different kinds of windows iterators. One for "valid" windows only that iterate only over windows within a valid region. For example in a valid-windows iterator over a 10x10 shaped 2d-array and window sizes of 4x4 the iterator would iterate totally over (10-4+1)x(10-4+1) items of size 4x4. A full-windows iterator would also iterate over the windows with invalid elements (on the edges of the array) and would simulate missing elements with zero elements.

Semi-nice visualization of n-dimensional chunks and windows iterators.

A similar approach could be used for n-dimensional chunks that iterate beyond the array sizes and replace missing/invalid array elements with zero elements.

Robbepop avatar Mar 14 '17 00:03 Robbepop

That sounds good, and I love a good illustration to go together with expressing an idea.

One concern I have here is that it's probably still not an efficient way to implement convolution, to go through this.

bluss avatar Mar 14 '17 00:03 bluss

That sounds good, and I love a good illustration to go together with expressing an idea.

Thanks! :)

One concern I have here is that it's probably still not an efficient way to implement convolution, to go through this.

True! For performance the most promising implementation for convolution is to use a GPGPU approach. But what if one doesn't want to go that far? The main question for this issue for me is, if it is possible to provide such iterators with equal performance as extern definitions. But as far as I can imagine such iterators would perform much better if implemented internally in this crate with hidden data of ArrayBase.

Also: What implementation would fit best? We could implement such iterator for example by using an item-sized local Array within the iterator to reuse the memory for copying the data out of the nd-array.

~~Or we could also use iterators over iterators over the nd-array and thus avoid copying of elements completely but I don't know if this approach is faster since I do not know ndarray internals too well.~~ As far as I can think the strike-throughed idea is not feasable if we want ArrayView as items for the new iterators.

Robbepop avatar Mar 14 '17 00:03 Robbepop

Between as_ptr, as_mut_ptr, and the .axes() iterator (each axis' length and stride), all the information of the raw representation is available, except for issues related to ownership, and they don't matter here. So I think it's possible to do it outside of the crate, but not so natural.

bluss avatar Mar 14 '17 00:03 bluss

Between as_ptr, as_mut_ptr, and the .axes() iterator (each axis' length and stride), all the information of the raw representation is available, except for issues related to ownership, and they don't matter here. So I think it's possible to do it outside of the crate, but not so natural.

Okay, good to know about the currently possible "hack", but yeah it wouldn't be a very natural way to encode this behavious and would require unsafe rust to deref the pointers which is unnecessary.

For me both proposed iterators would be great and would be of great use for my projects!

Robbepop avatar Mar 14 '17 01:03 Robbepop

Sure, the window iterator is a good idea. If I understand correctly, chunks has a chunk size in each dimension? .axis_chunks_iter() currently covers it if you just want to chunk it along one dimension, (using the full width of every other).

I also want to underline that using the unsafe methods or lower level code is not a hack but a feature, we need to allow it so that people can use it to improve their programs if needed.

bluss avatar Mar 14 '17 01:03 bluss

For my proposed idea I cannot use the axis_chunks_iter since it has a different behaviour. The items of the chunks iterator I proposed has the same dimension as the underlying Array. It just acts like the 1-dimensional chunks iterator for slices but for n-dimensions.

For 2-dimensions I could really use this for the pooling function within the convolution.

And for a 3 dimensional array of size 10x10x10 I could for example chunk it with 5x5x5 cubes and would iterate over (10³ / 5³) chunks in total.

For both iterators it should be decided whether we also are in need of full-iterator that (as described) above iterate over the full area, instead of only the valid area. This is possible since in ndarray elements are all NDFloat which is Copy and has a default value 0.0 which we could simply use for invalid elements.

Robbepop avatar Mar 14 '17 01:03 Robbepop

axis_chunks_iter gives views that have the same dimensionality (1d → 1d, 2d → 2d) though.

bluss avatar Mar 14 '17 01:03 bluss

axis_chunks_iter gives views that have the same dimensionality (1d → 1d, 2d → 2d) though.

Okay I will have to take another look into the docs. Maybe I have overseen things and it could really be used to simulate the chunks iterator that I proposed here.

Robbepop avatar Mar 14 '17 01:03 Robbepop

I have found a semi-nice visualization of the proposed chunks iterator for 2 dimensions here.

Robbepop avatar Mar 14 '17 01:03 Robbepop

Right, the current axis_chunks_iter cuts stripes, not squares. So they are different in that way.

bluss avatar Mar 14 '17 01:03 bluss

How do you feel about the importance of a chunks iterator facility for ndarray?

Robbepop avatar Mar 14 '17 02:03 Robbepop

Not important, but we can try to do it

bluss avatar Mar 14 '17 20:03 bluss

How do you feel about me trying to do it via PR? (This would probably take a while since I have to wrap my head around the libraries internals.)

Robbepop avatar Mar 14 '17 21:03 Robbepop

Go ahead.

  1. Remember how windows for slice has no mutable variant? That needs to be true for us too.

  2. Maybe I should tell about the trick that we usually use for iterators: We reuse the basic element iterator (Baseiter) . How to construct a chunks iterator? Make an iterator that visits exactly the top right corner of each chunk (This can be done using slicing with a stride, which we can also emulate manually). For each element, you map it to an ArrayView that starts in that element (its pointer is the arrayview's pointer!)

bluss avatar Mar 14 '17 21:03 bluss

Okay I will try! :)

Robbepop avatar Mar 14 '17 23:03 Robbepop

@bluss

Today I looked into the ndarray code to wrap my head around the iterator and especially the BaseIterator and ArrayBase code.

It is not yet very clear to me but could it be possible to design a windows or chunks iterator without any copy operations into an iterator-internal owning ArrayBase object by exploiting the strides of the ArrayBase and just giving out cheap instantiations of ArrayViews that have a proper ptr and strides?

I need feedback on this.

Robbepop avatar Mar 22 '17 10:03 Robbepop

I recognize it's not super easy to implement a new iterator, since it's so much about ndarray implementation details.

I think you would want do do something like InnerIter: have a Baseiter and also extra information. However for chunks/windows it needs to have a whole dimension/stride set, not just a single number (Basically D instead of an Ix). https://github.com/bluss/rust-ndarray/blob/master/src/iterators/mod.rs#L400

bluss avatar Mar 22 '17 11:03 bluss

Yep I got that point with the D: Dimension strides for the ArrayView but I was just curious if the whole thing is really implemented in that way as I thought that we have to store an internal copy of all the floats within the current iteration window or chunk but with this architecture all that copying is not needed and thus the whole iteration should be a lot more efficient than I thought in the first place. ^.^

Let's see if I can get it working. :S

Robbepop avatar Mar 22 '17 11:03 Robbepop

If I think of the chunks iterator and am not mistaken I think that

We use sliced dimension and sliced stride to initialize Baseiter (One can slice in place using the array method islice, but it can be done with lower level functions as well). Each yielded ArrayView will have the dimensions of a chunk (clipped to the full array's boundaries) and strides that are equal to those of the original array(!)

bluss avatar Mar 22 '17 13:03 bluss

This code could also be read to understand how to use a slice / subview of an array as a “map” of array views.

Imagine we have an array of N dimensions and want to cut it into separate one-dimensional “sticks” (or “lanes”).

That's what these methods do. Only difference is they use a callback instead of an iterator.

https://github.com/bluss/rust-ndarray/blob/d9492d7b5812e47dfeb23d062f2acd33a62b41cd/src/impl_methods.rs#L1371-L1386

bluss avatar Mar 22 '17 13:03 bluss

I think I understood how I can utilize islice to create the iterators. On iterator instantiation I create an ArrayView of the n-dimensional Array under iteration and with islice I mutate the internal ArrayView and give out references to it on a call to next.

I could also use slice method to create a new instance of the ArrayView instead of mutating an old one. The advantage here would be that I could easily move the newly created ArrayView on a call to next and would yield values instead of references on iteration which would be nice in case of non-owning ArrayViews.

The part that is currently unclear to me is, how I can perform the n-dimensional iteration of all possible Slices of the chunked or windowed Array. Is there built-initerator functionality for that to yield all possible indices of the array? - Is the Indices iterator the right utility for this approach?

We need an efficient way for iterating over all possible slices or indices. I saw that the iteration Item for Indices is a Vec<usize> which implies tons of potential malloc invokations. We should avoid that I think.

Robbepop avatar Mar 24 '17 14:03 Robbepop

The iteration Item for Indices is D::Pattern where D is a dimension.

That means if D is Ix2, the item is (usize, usize). If D is IxDyn, dynamic number of axes, the item is indeed Vec<usize>. It's now IxDyn works, it uses a lot of dynamic allocations, but it's not the usual way you'd use ndarray. Ix1, Ix2 etc are the common ones.

bluss avatar Mar 24 '17 14:03 bluss

This is maybe not so straightforward, you could leave this to me if you want.

bluss avatar Mar 24 '17 14:03 bluss

That means if D is Ix2, the item is (usize, usize).

That's awesome!

This is maybe not so straightforward, you could leave this to me if you want.

I would like to try it out now if this is okay for you.

So I think I can design Windows like so:


struct Windows<'a, A: 'a, D: 'a + Dimension> {
    /// view of the entire array under iteration
    view   : ArrayView<'a, A, D>,
    /// iterator for all possible valid window indices
    indices: Indices<D>,
    /// the shape of the windows
    shape  : Shape<D>
}

pub fn windows<'a, A: 'a, D: 'a + Dimension>(array: ArrayView<'a, A, D>, shape: Shape<D>) -> Windows<'a, A, D> {
    Windows{
        view:    array.view(),
        indices: indices_of(&array), // restrict array to area for valid windows shapes:
                                     // e.g. array.shape() == [10, 8], window shape is [5, 4], then indices, area
                                     // should be (0..5, 0..4).
        shape:   shape
    }
}

impl<'a, A: 'a, D: 'a + Dimension> Iterator for Windows<'a, A, D> {
    type Item = ArrayView<'a, A, D>;

    fn next(&mut self) -> Option<Self::Item> {
        match self.indices.next() {
            None => None,
            Some(idx) => {
                let slice_arg = slice_at_idx_of_shape(self.shape, idx); // function that converts a shape and a given index into a SliceArg
                Some(self.view.slice(slice_arg)) // return window with given shape at current index
            }
        }
    }
}

Questions for me to solve:

  • How to restrict the indices iteration area to valid windows for given shape s.
  • How to convert a given idx in next into a shape of the given shape `s.

How close or far am I from your theory behind the codes for the windows iterator?

To create a windows-slice for slice method from a given shape Dim and an index (offset) Dim without dynamic memory allocation in the special case of 2 dimensions I came to this code:

fn slice_at_idx_of_shape_2d<D: Dimension>(shape: D, idx: D) -> [Si; 2] {
    [
        Si(idx[0] as isize, Some((idx[0] + shape[0]) as isize), 1),
        Si(idx[1] as isize, Some((idx[1] + shape[1]) as isize), 1)
    ]
}

Provided that this implementation is correct it would be best imo to create a 1D, 2D, ... 6D version for it and a generalize dynamic allocation requiring method to avoid dynamic allocation up to 5 dimensional arrays. With strides we might be able to easily implement chunks-iterators.

Robbepop avatar Mar 24 '17 15:03 Robbepop

I have created a prototype implementation of a generic, malloc avoiding into_si_with_offset trait implementation which can be found here.

This makes it possible to map a pair of shape: Dim (windows/chunks size) and offset: Dim (from indices) into a slice argument for slicing.

Robbepop avatar Mar 24 '17 19:03 Robbepop

Making SliceArg available like that is good

bluss avatar Mar 24 '17 19:03 bluss

I have sort of gone into this territory with Zip, a chunk iterator might show up there.

bluss avatar Mar 27 '17 21:03 bluss

I haven't had quite enough time to work on this the last few days. It is nice that you also worked on another path there which looks really promising! Maybe I can actually use this zip a lot in order to improve already existing Prophet code.

I have sort of gone into this territory with Zip, a chunk iterator might show up there.

Does that mean that you are able to implement a Chunks iterator using this new facility?

I came to the conclusion that a Windows iterator is a lot easier to implement than a Chunks iterator. Basically the template you just gave me needs only little changes for it to be a valid Windows iterator.

Robbepop avatar Mar 27 '17 23:03 Robbepop

What I want now with Zip is that one can zip together chunk producers and arrays.

If an array is dim (10, 10) and we get a chunks producer with (2, 2) chunks, the chunks producer has dim (5, 5) and we can zip it with other (5, 5) arrays.

Iterators are stateful, and handling the halfway-iterated state is troublesome. So I'm going the route (like rayon) of having “Producers” (Think of that as IntoIterator for the array zip). A producer can be split etc for parallelization. Some objects like Chunks is a Producer and IntoIterator, other objects like AxisIter can implement Producer and Iterator directly (because the axis iter is one dimensional, this is not a problem).

bluss avatar Mar 28 '17 00:03 bluss

Sounds very nice so far - especially the improved integration with rayon! The question that I am asking myself is: Do you still need a Chunks and Windows iterator implementation after this huge commit or is it self-contained in it?

If you no longer need it, I can at least serve you with the test cases implementation I came up with. There are many for chunks: (just for 2 dimensional arrays so far)

  • only even chunks (easy one)
  • odd chunks on the right handside (horizontal) (hard)
  • odd chunks on the vertical axis (hard)
  • odd chunks on both axes
  • chunks of size 1x1 -> should be same as normal iteration. So should this do some warning on use?
  • chunks of size 0 -> error!
  • chunks of size greater than the underlying array -> what is the expected behaviour for this?

Odd chunks are shaped so fit into the remaining valid region, as in the standard chunks iterator.

We could maybe also implement an EvenChunks iterator besides this one, that would simply skip odd chunks or panic on misusage (what you like better) which should give us some performance when we do not need to cover these many edge cases. What do you think?

Robbepop avatar Mar 28 '17 01:03 Robbepop