Peroxide
Peroxide copied to clipboard
Optimization after refactoring to AD in 0.30 - example failing
Hi, thanks a lot for the great library.
I tried running the example in optimization.rs in 0.30.6 and it does not optimize, keeping the initial value in p output. Using 0.29.1 - its example (i.e. before refactoring from numbers to AD) works OK.
Thanks a lot,
Pavel.
@pavhofman Thanks for the good point!
It is caused from operation rule of AD
.
In example, quad
function is given as follows:
fn quad(x: &Vec<f64>, n: Vec<AD>) -> Option<Vec<AD>> {
Some(
x.clone().into_iter()
.map(|t| AD::from(t))
.map(|t| t.pow(n[0]))
.collect()
)
}
See AD::from(t)
. AD::from
is the function which has the signature - f64
-> AD0
. Then in next step, t.pow(n[0])
represents AD0.pow(AD0 or AD1 or AD2)
. But any arithmetic operations between AD
enums always return lower order. Thus, t.pow(n[0])
returns AD0
type. And AD0
has zero derivative. Therefore, we can't optimize with above function.
But we can optimize with below function!
fn quad(x: &Vec<f64>, n: Vec<AD>) -> Option<Vec<AD>> {
Some(
x.clone().into_iter()
.map(|t| AD1(t, 0f64))
.map(|t| t.pow(n[0]))
.collect()
)
}
Because of the type AD1
, it works! (I'll update this example in next version)
BTW, which syntax seems better? If you choose second one or suggest any other idea, then I'll implement it in next version.
// First one - Current Version (`0.30.6`)
fn quad(x: &Vec<f64>, n: Vec<AD>) -> Option<Vec<AD>> {
Some(
x.clone().into_iter()
.map(|t| AD1(t, 0f64))
.map(|t| t.pow(n[0]))
.collect()
)
}
// Second one - Candidate of future version
fn quad(x: &Vec<f64>, n: Vec<AD>) -> Option<Vec<AD>> {
Some(
x.clone().into_iter()
.map(|t| t.to_ad1())
.map(|t| t.pow(n[0]))
.collect()
)
}
Hi, thank you very much for very fast fix. I will try it tomorrow.
As of the choices - I would probably prefer the t.to_ad1() version as it does not require any extra constant in the API.
I am a newbie to Rust. Today I tried hard (and always failed) to write a closure which would define the optimized function in one place - to be used both for calculating the measured or precise value and for running the optimization - t.pow(n[0]) in your example. So far all of my optimization uses (in Octave/Matlab) involved other variables too which means a closure instead of a function (the same in octave too).
Please would that be possible to consider an option or API for using one function/closure for this
// Noise to image let mut y = x.fmap(|t| t.powi(2));
and this:
fn quad(x: &Vec
Just the part inside the iteration. Often the optimized functions are very long and having them at two places is a bit inconvenient and error-prone.
Thanks a lot for your great effort.
With regards,
Pavel.
Thanks for great suggestion. I also felt inconvenience of optimization in peroxide. In fact, the optimized function in one place can be achieved by generic function.
#[macro_use]
extern crate peroxide;
use peroxide::fuga::*;
fn main() {
// ...
// Noise to image
let mut y = x.fmap(|t| pow_fn(t, &vec![2f64])); // <---- use pow_fn
// ...
// Optimizer setting
let mut opt = Optimizer::new(data, pow_optim);
// ...
}
// One generic function (`Real` = `f64` or `AD`)
fn pow_fn<T: Real>(x: T, n: &Vec<T>) -> T {
x.pow(n[0])
}
// Boiler plate to use optimize
fn pow_optim(x: &Vec<f64>, n: Vec<AD>) -> Option<Vec<AD>> {
Some(
x.clone().into_iter()
.map(|t| AD1(t, 0f64))
.map(|t| pow_fn(t, &n)) // <---- use pow_fn
.collect()
)
}
But it seems uncomfortable because of boiler plate. So, there should be improvement of API. Maybe next code seems more comfortable.
// Candidate
fn main() {
// ...
// Optimizer setting
let mut opt = Optimizer::new(data, pow_fn_optim);
}
#[optimize]
fn pow_fn<T: Real>(x: T, n: &Vec<T>) -> T {
x.pow(n[0])
}
#[optimize]
automatically generates fn pow_fn_optim(x: &Vec<f64>, n: Vec<AD>) -> Option<Vec<AD>>
via procedural macro. In this case, you should use FUNCTIONNAME_optim
to construct optimizer.
I'll implement above functionality in next version. (Maybe 0.31.0
)
Thank you once again for great suggestions! :smile:
Hi, thanks a lot for your great support.
The updated example works very good now. I am just afraid the move to AD has introduced some extra wrapping in the critical path which affects performance. This is comparison of your example in 0.29 and 0.30 for the 100 points as in the example (time measured by measure_time crate just for the optimization run + printout:
{
print_time!("{:?}", "Regression 0.30");
let p = opt.set_init_param(n_init)
.set_max_iter(200)
.set_method(LevenbergMarquardt)
.optimize();
p.print(); // Optimized parameter
opt.get_error().print(); // Optimized RMSE
}
target/release/test_rust
[2.0002]
101.4596320642305
"Regression 0.30" took 4.10994ms
target/release/test_rust
[1.9994]
113.37364941459128
"Regression 0.29" took 2.861277ms
And for 10,000 points (let mut x = seq(0., 99., 0.01f64);):
target/release/test_rust
[2.0000]
100.30175592699732
"Regression 0.30" took 325.714667ms
target/release/test_rust
[2.0001]
100.5700492713708
"Regression 0.29" took 264.950832ms
Please is there any way to do something about the performance regression? I am afraid it is a major slowdown, especially for the shorter runs. My application is embedded/soft real-time and every ms counts, hence the choice of rust :-)
Thank you very much for your suggestion of the generic function. IIUC there are no generic closures in Rust. The problem is all my optimization functions require additional (fixed, non-optimized) parameters which I do not know how to pass without using a closure instead of a function.
I am sorry for posting so many questions :-) Thanks a lot for your patience.
Best regards,
Pavel.
Maybe there are four ways to achieve high performance.
-
Reduce the number of iterations : In this case, 50 max iters are enough.
-
Reduce the number of wrappings : As you mentioned, extra wrapping makes slower. So, I'll implement direct operations between
f64
andAD
soon. For the time being, below code makes faster.
// Current version
fn quad(x: &Vec<f64>, n: Vec<AD>) -> Option<Vec<AD>> {
Some(
x.clone().into_iter()
.map(|t| pow_temp(t, n[0]))
.collect()
)
}
#[inline]
fn pow_temp(x: f64, y: AD) -> AD {
AD1(x.powf(y.x()), x.powf(y.x()) * x.ln() * y.dx())
}
For the next version, you will be able to use below code.
// Future version
fn quad(x: &Vec<f64>, n: Vec<AD>) -> Option<Vec<AD>> {
Some(
x.clone().into_iter()
.map(|t| t.pow(n[0])) // <-- Direct computation
.collect()
)
}
-
Change the structure of
AD
: Until0.28.2
, peroxide usedproc_macro
to generateAD
structures. It guaranteed perfect runtime performance, but it required egregious compile time. Thus, after0.30.0
I changed the structure ofAD
toenums
. Which do you prefer? It's helpful to refer #38 . -
JAAF : Jacobian As A Function : In current version, Optimizer requires obtain jacobian at every step. Though, it is obtained by automatic differentiation, it decreases performance. If jacobian can be pre-calculated at first step to semi-symbolic form via algebraic expression evaluation engine (e.g. fasteval), then huge improvements of performance is expected. I plan to implement this in the future.
Thanks for great benchmark. The results have alarmed me. I will think about performance more seriously.
Thanks! Regarding the jacobian - would that be possible to enter the jacobian matrix externally just like the optimized function, i.e. a closure? From what I have seen most engines allow that - e.g. optimset ('dfdp', fDfdp) in octave, gradient in optimization-engine https://github.com/alphaville/optimization-engine/blob/master/examples/panoc_ex1.rs#L30 etc. Automated calculation as default, but with an option of user-provided function/closure.
From my POW the compilation time is not critical but runtime is. Cross-compiled on a powerful PC, running on an embedded ARM. Other users may have it reversed but probably not.
I think I found a way to make the quad function a closure and use it for both creating measured values and running the optimization. The one-time conversion from AD when generating y is no overhead.
let gain = 5.0;
let quad_closure = |x: &Vec<f64>, n: Vec<AD>| -> Option<Vec<AD>> {
Some(
x.clone().into_iter()
.map(|t| AD1(t, 0f64))
.map(|t| gain * t.pow(n[0]))
.collect()
)
};
// Noise to image
let n = vec![AD0(2.0)];
let mut y = quad_closure(&x, n).unwrap().into_iter().map(|t| t.x()).collect();
....
let mut opt = Optimizer::new(data, quad_closure);
Actually I am comparing re-compilation times of my example in v. 0.28.1 and v. 0.30.6 and do not notice any difference. The crate itself is compiled only once, no problem waiting a bit, IMO.
Finished release [optimized] target(s) in 1.52s
real 0m1,651s
user 0m4,328s
sys 0m0,205s
Finished release [optimized] target(s) in 1.49s
real 0m1,616s
user 0m4,195s
sys 0m0,230s
``
As of the iteration runs - some engines have an optional parameter (+ some tiny default) for a minimum change ratio of the parameter between iterations ((e.g. less than 0.1%, or even as a vector, value for each params) and if an iteration produces change in all params below their minimum change ratio, it stops. Typically the convergence is much faster than 50 runs - below 20 in your example. Perhaps that could be a way towards a major speedup. Estimating the required number of iterations upfront is difficult, IMO.
Sorry for taking so much of your time :-)
I have reread my posts - it sounds like I am just requesting features instead of sending patches. I have just started with rust but I can try adding e.g. the minimum change option and send a PR for review/rework, if you agree. Please just let me know if your planned changes are done so that I could start with the latest code. Thanks a lot, I very much appreciate your kind help and effort.
Thanks! Regarding the jacobian - would that be possible to enter the jacobian matrix externally just like the optimized function, i.e. a closure? From what I have seen most engines allow that - e.g. optimset ('dfdp', fDfdp) in octave, gradient in optimization-engine https://github.com/alphaville/optimization-engine/blob/master/examples/panoc_ex1.rs#L30 etc. Automated calculation as default, but with an option of user-provided function/closure.
Good suggestion! For future version, manual gradient function(or closure) can be allowed. (Unfortunately, in current version, it's impossible)
Actually I am comparing re-compilation times of my example in v. 0.28.1 and v. 0.30.6 and do not notice any difference. The crate itself is compiled only once, no problem waiting a bit, IMO.
Finished release [optimized] target(s) in 1.52s real 0m1,651s user 0m4,328s sys 0m0,205s
Finished release [optimized] target(s) in 1.49s real 0m1,616s user 0m4,195s sys 0m0,230s ``
That is because of implementation. Until 0.29
, optimization has depended on Number
not AD
. So, in 0.28.1
, although there is AD
, but optimization does not depend on AD
. Of course, simple enums in stack are also fast, maybe the gap is not significant. Until Rust's compile time improves, I think it's okay to keep the status quo.
As of the iteration runs - some engines have an optional parameter (+ some tiny default) for a minimum change ratio of the parameter between iterations ((e.g. less than 0.1%, or even as a vector, value for each params) and if an iteration produces change in all params below their minimum change ratio, it stops. Typically the convergence is much faster than 50 runs - below 20 in your example. Perhaps that could be a way towards a major speedup. Estimating the required number of iterations upfront is difficult, IMO.
Oh.. I forgot implementation of stop condition. I already implemented that feature in ode which has similar api. That is a good point!
Sorry for taking so much of your time :-)
Don't be, these discussions also meaningful to me :smile:
I have reread my posts - it sounds like I am just requesting features instead of sending patches. I have just started with rust but I can try adding e.g. the minimum change option and send a PR for review/rework, if you agree. Please just let me know if your planned changes are done so that I could start with the latest code. Thanks a lot, I very much appreciate your kind help and effort.
Just requesting features is also helpful enough for peroxide
. Of course, PR is always welcome! If I finish the modification of operation between f64
and AD
, then I'll leave a comment on this thread.
Thanks for your great interests & suggestions! :+1: