rulinalg
rulinalg copied to clipboard
Iterator performance tips
I'm briefly looking at rulinalg’s SliceIter. You can use the same kind of optimization used in ndarray's element iterator.
The brief idea of the "problem" is that this is efficent:
for i in rows
for j in row(i)
// access element at i, j
SliceIter::next() (just like ndarray's Iter::next) emulates this segmented traversal with a bit of a state machine. Unfortunately the compiler doesn't know how to turn this state machine back into two nested for loops(!)
Two main ideas I have used:
- Use the regular
[T]iterator when the matrix is contiguous. A branch on this even inside the.next()method seems to work well, the conditional is lifted out of the loop. - When you can't improve .next() for the non-contiguous case, you can at least special case
Iterator::fold(). It is essentially the same idea as the .chain() change in https://github.com/rust-lang/rust/pull/37315 ; you recurse and fold a bunch of component iterators (since each row is contiguous, you just call the slice iterator's fold on each row in turn).
Thank you so much for opening this!
I think it will take me a little while to digest this, but from what I do understand it will be really valuable.
I was just reading through this again @bluss . How do you differentiate between the contiguous and non-contiguous cases in ndarray? A contributor suggested we add specialist types for contiguous and non-contiguous data. This would help us pull out a little more performance in exchange for more compexity. I'm starting to come around to this idea and was wondering if you do something similar?
Thanks!
ndarray computes this from the dimensions and strides every time the array's data is requested as slice or iterator. It's in the plan to cache this computation with a flag, but it hasn't been done yet.
I don't know how many different memory layouts your library supports, but it's fewer than ndarray, so it's possible that the check is a very cheap comparison.
Type-level information lets you save code bloat, I'm sure that's useful. I didn't want to have more array types.