optimization
Summary
Optimize kernel generator so it can use matrix_cl's move assignment where possible instead of copying the data.
Tests
Added a test to check that the new optimization is used.
Side Effects
None.
Release notes
OpenCL: Optimized kernel generator so it can use matrix_cl's move assignment where possible instead of copying the data.
Checklist
-
[ ] Math issue #(issue number)
-
[ ] Copyright holder: Tadej Ciglarič
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses: - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause) - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
-
[ ] the basic tests are passing
- unit tests pass (to run, use:
./runTests.py test/unit) - header checks pass, (
make test-headers) - dependencies checks pass, (
make test-math-dependencies) - docs build, (
make doxygen) - code passes the built in C++ standards checks (
make cpplint)
- unit tests pass (to run, use:
-
[ ] the code is written in idiomatic C++ and changes are documented in the doxygen
-
[ ] the new changes are tested
| Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
|---|---|---|---|---|
| gp_pois_regr/gp_pois_regr.stan | 3.14 | 3.03 | 1.04 | 3.45% faster |
| low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.02 | 0.02 | 0.97 | -3.07% slower |
| eight_schools/eight_schools.stan | 0.11 | 0.11 | 1.05 | 5.04% faster |
| gp_regr/gp_regr.stan | 0.16 | 0.16 | 1.01 | 1.14% faster |
| irt_2pl/irt_2pl.stan | 5.9 | 5.82 | 1.01 | 1.41% faster |
| performance.compilation | 87.82 | 87.27 | 1.01 | 0.62% faster |
| low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 8.67 | 8.55 | 1.01 | 1.29% faster |
| pkpd/one_comp_mm_elim_abs.stan | 29.96 | 29.69 | 1.01 | 0.91% faster |
| sir/sir.stan | 128.18 | 130.53 | 0.98 | -1.83% slower |
| gp_regr/gen_gp_data.stan | 0.03 | 0.03 | 1.0 | 0.42% faster |
| low_dim_gauss_mix/low_dim_gauss_mix.stan | 3.01 | 2.99 | 1.01 | 0.97% faster |
| pkpd/sim_one_comp_mm_elim_abs.stan | 0.4 | 0.39 | 1.02 | 2.2% faster |
| arK/arK.stan | 1.88 | 1.88 | 1.0 | 0.09% faster |
| arma/arma.stan | 0.93 | 0.82 | 1.12 | 10.97% faster |
| garch/garch.stan | 0.63 | 0.53 | 1.19 | 15.83% faster |
| Mean result: 1.02960269611 |
Jenkins Console Log Blue Ocean Commit hash: 6738d7bb8cf8b5139e167b9d2fbbafa9855755d6
Machine information
ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz
G++: Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
Clang: Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
As much as I'd prefer not to lose good work, there were outstanding comments that weren't addressed, so I'm closing this.