Femi
Femi
Flamegraphs from profiling attached. I haven't studied profiling or flamegraphs to build a firm foundation yet so I can't provide any insights. Each profile was run for 1min. [Linfa_pls.zip](https://github.com/rust-ml/linfa/files/10018806/Linfa_pls.zip)
[par_azip](https://docs.rs/ndarray/latest/ndarray/macro.par_azip.html) can be used in place of zip in our code as this is where a significant amount of time is spent. Regression-Nipals-5feats-100_000 would benefit from the above the screenshots...
Our param_guard code ``` /// Performs checking step and calls `fit` on the checked hyperparameters. If checking failed, the /// checking error is converted to the original error type of...
Hmm did I interpret the flamegraph wrong? I thought since it was at the top most of the CPU time was spent there or I guess it's possible that it's...
Flamegraphs from profiling attached. I haven't studied profiling or flamegraphs to build a firm foundation yet so I can't provide any insights. Each profile was run for 1min. [Linfa_linear.zip](https://github.com/rust-ml/linfa/files/10018808/Linfa_linear.zip)
did a quick review. 10 feets GLM 100_000 samples is spending most of its CPU time at this step `ndarray::zip::Zip::for_each` this is called twice. Both calls share an ancestor of...
Also if we use the rayon backend we could look into using [par_azip](https://docs.rs/ndarray/latest/ndarray/macro.par_azip.html) instead of zip. It is the parallel version.
I think calling multiplication less often would likely be the bigger performance boost. I can locally test enabling the `matrixmultiply-threading` feature. I didn't profile the BLAS version. Is this desirable...
Okay I'll 2 more. One with blas and one with that the matrixmultiply feature.
 Blas flamegraph a lil different then the other