rulinalg
rulinalg copied to clipboard
Divide and Conquer Parallelism
From @AtheMathmo on April 17, 2016 0:55
Would be nice if we could get things running on more than once core! I've been playing around with getting this working for matrix multiplication for a while. Now that we have MatrixSlice we can get something decent working. My initial tests produced the following benchmarks:
test linalg::matrix::mat_mul_128_100 ... bench: 221,813 ns/iter (+/- 28,576) test linalg::matrix::mat_paramul_128_100 ... bench: 213,257 ns/iter (+/- 16,667) test linalg::matrix::mat_blasmul_128_100 ... bench: 107,305 ns/iter (+/- 14,451)
test linalg::matrix::mat_mul_128_1000 ... bench: 1,994,442 ns/iter (+/- 79,774) test linalg::matrix::mat_paramul_128_1000 ... bench: 1,147,764 ns/iter (+/- 136,592) test linalg::matrix::mat_blasmul_128_1000 ... bench: 996,405 ns/iter (+/- 109,778)
test linalg::matrix::mat_mul_128_10000 ... bench: 21,185,583 ns/iter (+/- 794,584) test linalg::matrix::mat_paramul_128_10000 ... bench: 11,687,473 ns/iter (+/- 638,582) test linalg::matrix::mat_blasmul_128_10000 ... bench: 10,278,981 ns/iter (+/- 973,273)
test linalg::matrix::mat_mul_128_100000 ... bench: 210,618,866 ns/iter (+/- 4,908,516) test linalg::matrix::mat_paramul_128_100000 ... bench: 112,120,346 ns/iter (+/- 6,052,281) test linalg::matrix::mat_blasmul_128_100000 ... bench: 102,699,089 ns/iter (+/- 9,024,207)
We get roughly a 2x increase in performance (on my sub-par laptop) when using the parallel implementation (that is currently on the paramul branch). The above results are for f32 only. For f64 the largest benchmark produces:
test linalg::matrix::mat_mul_f64_128_100000 ... bench: 445,007,480 ns/iter (+/- 71,323,075) test linalg::matrix::mat_paramul_f64_128_100000 ... bench: 254,693,413 ns/iter (+/- 57,254,546)
This is a promising start. This issue will track progress.
Copied from original issue: AtheMathmo/rusty-machine#44
For information, there is no paramul branch in this repo.
Ah, you're right. I'll try to port the branch this weekend - though I'm not sure how close it is to being usable.
Just wanted to let you know. No hurry
On 28 Jul 2016 11:25, "James Lucas" [email protected] wrote:
Ah, you're right. I'll try to port the branch this weekend - though I'm not sure how close it is to being usable.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/AtheMathmo/rulinalg/issues/5#issuecomment-235790825, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAszpnEOD1jFrPE38YbZn_AlSVRDELQks5qaCEigaJpZM4JJ--D .
And thank you for doing so!
Any news on this? I recently started using this lib (great work btw!!) and was thinking that it would be nice to have parallelised matrix ops and found this issue.
There hasn't been any progress on this issue and I think that unfortunately it is fairly low down on our list of priorities. There are some correctness issues we should tackle first.
Understood. Totally agree that correctness should be highest priority :)