rulinalg
rulinalg copied to clipboard
In place transposition for Matrix (and MatrixSliceMut)
From @AtheMathmo on July 6, 2016 22:10
We currently have a transpose(&self) -> Matrix<T> method. It would be nice if we could do this operation in place, i.e. in_place_transpose(&mut self).
We can achieve this by swapping places in the data. We could do this fairly easily by utilizing the existing slice::swap method. This is acceptable but will include unnecessary bound checking, to speed things up we could use the underlying technique from that swap method:
unsafe {
let pa: *mut T = self.get_unchecked_mut(i);
let pb: *mut T = self.get_unchecked_mut(j);
ptr::swap(pa, pb);
}
It would be nice to have this method in place for both Matrix and MatrixSliceMut. The Matrix implementation could simply call self.as_mut_slice().in_place_transpose().
To resolve this issue
- In place transposition for square
Matrix. - In place transposition for square
MatrixSliceMut. - Tests for both.
Bonus points
- Using
ptr::swapto avoid bounds checking. - Explore vectorization and cache-line utilization.
- Explore more advanced techniques.
- Use permutations to transpose non-square matrices.
Copied from original issue: AtheMathmo/rusty-machine#97
These slides provide an overview of a good (and relatively simple) implementation: http://www.netlib.org/utk/people/JackDongarra/CCDSC-2014/talk35.pdf
There is a paper linked in the article which may be a valuable read.
Edit: I think I'll try implementing this paper. It looks relatively simple to get working!
I've got an initial (messy, slow) implementation of the algorithm in the linked paper here: https://github.com/theotherphil/rulinalg/commit/4f2355992487856c1e02662627efa9d9f4c875fa
I'll tidy and speed it up before creating a pull request.
Hmmm. I've just found this: https://athemathmo.github.io/2016/08/29/inplace-transpose.html
Looks like you've already done this! But it doesn't appear to have been merged.
Hey @theotherphil ! Sorry that I'm slow to reply.
Yes I had already tried implementing this and you're right that it is slower than out-of-place transpose, which is expected. I had not merged it yet due to the need for in-place asm and/or some 128 bit integer work. Once these are easier to support on all systems I'd be happy to bring the old branch up to master and merge.
Sorry that I didn't comment sooner to let you know.
I was googling for 128 bit integer support in Rust when I found your article! It looks like 128 bit integers should be getting stabilised soon. Fingers crossed.
The fast divide trick is already covered by an existing (nightly-only) crate. Might be worth reusing that when u128 is stabilised. https://github.com/fulmicoton/fastdivide