math icon indicating copy to clipboard operation
math copied to clipboard

optimization

Open t4c1 opened this issue 4 years ago • 1 comments

Summary

Optimize kernel generator so it can use matrix_cl's move assignment where possible instead of copying the data.

Tests

Added a test to check that the new optimization is used.

Side Effects

None.

Release notes

OpenCL: Optimized kernel generator so it can use matrix_cl's move assignment where possible instead of copying the data.

Checklist

  • [ ] Math issue #(issue number)

  • [ ] Copyright holder: Tadej Ciglarič

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses: - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause) - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • [ ] the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • [ ] the code is written in idiomatic C++ and changes are documented in the doxygen

  • [ ] the new changes are tested

t4c1 avatar Jul 27 '21 12:07 t4c1


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.14 3.03 1.04 3.45% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.97 -3.07% slower
eight_schools/eight_schools.stan 0.11 0.11 1.05 5.04% faster
gp_regr/gp_regr.stan 0.16 0.16 1.01 1.14% faster
irt_2pl/irt_2pl.stan 5.9 5.82 1.01 1.41% faster
performance.compilation 87.82 87.27 1.01 0.62% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.67 8.55 1.01 1.29% faster
pkpd/one_comp_mm_elim_abs.stan 29.96 29.69 1.01 0.91% faster
sir/sir.stan 128.18 130.53 0.98 -1.83% slower
gp_regr/gen_gp_data.stan 0.03 0.03 1.0 0.42% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.01 2.99 1.01 0.97% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.4 0.39 1.02 2.2% faster
arK/arK.stan 1.88 1.88 1.0 0.09% faster
arma/arma.stan 0.93 0.82 1.12 10.97% faster
garch/garch.stan 0.63 0.53 1.19 15.83% faster
Mean result: 1.02960269611

Jenkins Console Log Blue Ocean Commit hash: 6738d7bb8cf8b5139e167b9d2fbbafa9855755d6


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU: Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++: Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix

Clang: Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix

stan-buildbot avatar Jul 27 '21 22:07 stan-buildbot

As much as I'd prefer not to lose good work, there were outstanding comments that weren't addressed, so I'm closing this.

syclik avatar Aug 18 '22 14:08 syclik