rulinalg icon indicating copy to clipboard operation
rulinalg copied to clipboard

Iterator performance tips

Open bluss opened this issue 8 years ago • 4 comments

I'm briefly looking at rulinalg’s SliceIter. You can use the same kind of optimization used in ndarray's element iterator.

The brief idea of the "problem" is that this is efficent:

for i in rows
   for j in row(i)
       // access element at i, j

SliceIter::next() (just like ndarray's Iter::next) emulates this segmented traversal with a bit of a state machine. Unfortunately the compiler doesn't know how to turn this state machine back into two nested for loops(!)

Two main ideas I have used:

  1. Use the regular [T] iterator when the matrix is contiguous. A branch on this even inside the .next() method seems to work well, the conditional is lifted out of the loop.
  2. When you can't improve .next() for the non-contiguous case, you can at least special case Iterator::fold(). It is essentially the same idea as the .chain() change in https://github.com/rust-lang/rust/pull/37315 ; you recurse and fold a bunch of component iterators (since each row is contiguous, you just call the slice iterator's fold on each row in turn).

bluss avatar Nov 02 '16 15:11 bluss

Thank you so much for opening this!

I think it will take me a little while to digest this, but from what I do understand it will be really valuable.

AtheMathmo avatar Nov 02 '16 15:11 AtheMathmo

I was just reading through this again @bluss . How do you differentiate between the contiguous and non-contiguous cases in ndarray? A contributor suggested we add specialist types for contiguous and non-contiguous data. This would help us pull out a little more performance in exchange for more compexity. I'm starting to come around to this idea and was wondering if you do something similar?

Thanks!

AtheMathmo avatar Dec 02 '16 16:12 AtheMathmo

ndarray computes this from the dimensions and strides every time the array's data is requested as slice or iterator. It's in the plan to cache this computation with a flag, but it hasn't been done yet.

I don't know how many different memory layouts your library supports, but it's fewer than ndarray, so it's possible that the check is a very cheap comparison.

bluss avatar Dec 02 '16 16:12 bluss

Type-level information lets you save code bloat, I'm sure that's useful. I didn't want to have more array types.

bluss avatar Dec 02 '16 16:12 bluss