rulinalg Divide and Conquer Parallelism

From @AtheMathmo on April 17, 2016 0:55

Would be nice if we could get things running on more than once core! I've been playing around with getting this working for matrix multiplication for a while. Now that we have MatrixSlice we can get something decent working. My initial tests produced the following benchmarks:

test linalg::matrix::mat_mul_128_100 ... bench: 221,813 ns/iter (+/- 28,576) test linalg::matrix::mat_paramul_128_100 ... bench: 213,257 ns/iter (+/- 16,667) test linalg::matrix::mat_blasmul_128_100 ... bench: 107,305 ns/iter (+/- 14,451)

test linalg::matrix::mat_mul_128_1000 ... bench: 1,994,442 ns/iter (+/- 79,774) test linalg::matrix::mat_paramul_128_1000 ... bench: 1,147,764 ns/iter (+/- 136,592) test linalg::matrix::mat_blasmul_128_1000 ... bench: 996,405 ns/iter (+/- 109,778)

test linalg::matrix::mat_mul_128_10000 ... bench: 21,185,583 ns/iter (+/- 794,584) test linalg::matrix::mat_paramul_128_10000 ... bench: 11,687,473 ns/iter (+/- 638,582) test linalg::matrix::mat_blasmul_128_10000 ... bench: 10,278,981 ns/iter (+/- 973,273)

test linalg::matrix::mat_mul_128_100000 ... bench: 210,618,866 ns/iter (+/- 4,908,516) test linalg::matrix::mat_paramul_128_100000 ... bench: 112,120,346 ns/iter (+/- 6,052,281) test linalg::matrix::mat_blasmul_128_100000 ... bench: 102,699,089 ns/iter (+/- 9,024,207)

We get roughly a 2x increase in performance (on my sub-par laptop) when using the parallel implementation (that is currently on the paramul branch). The above results are for f32 only. For f64 the largest benchmark produces:

test linalg::matrix::mat_mul_f64_128_100000 ... bench: 445,007,480 ns/iter (+/- 71,323,075) test linalg::matrix::mat_paramul_f64_128_100000 ... bench: 254,693,413 ns/iter (+/- 57,254,546)

This is a promising start. This issue will track progress.

Copied from original issue: AtheMathmo/rusty-machine#44

Jul 12 '16 02:07 AtheMathmo

For information, there is no paramul branch in this repo.

Jul 28 '16 02:07 tafia

Ah, you're right. I'll try to port the branch this weekend - though I'm not sure how close it is to being usable.

Jul 28 '16 03:07 AtheMathmo

Just wanted to let you know. No hurry

On 28 Jul 2016 11:25, "James Lucas" [email protected] wrote:

Ah, you're right. I'll try to port the branch this weekend - though I'm not sure how close it is to being usable.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/AtheMathmo/rulinalg/issues/5#issuecomment-235790825, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAszpnEOD1jFrPE38YbZn_AlSVRDELQks5qaCEigaJpZM4JJ--D .

Jul 28 '16 03:07 tafia

And thank you for doing so!

Jul 28 '16 03:07 AtheMathmo

Any news on this? I recently started using this lib (great work btw!!) and was thinking that it would be nice to have parallelised matrix ops and found this issue.

Aug 08 '17 14:08 lloydmeta

There hasn't been any progress on this issue and I think that unfortunately it is fairly low down on our list of priorities. There are some correctness issues we should tackle first.

Aug 08 '17 15:08 AtheMathmo

Understood. Totally agree that correctness should be highest priority :)

Aug 09 '17 01:08 lloydmeta

rulinalg rulinalg copied to clipboard

Divide and Conquer Parallelism

rulinalg
rulinalg copied to clipboard