StochasticDelayDiffEq.jl
StochasticDelayDiffEq.jl copied to clipboard
Solution performance for network of delays
I am trying to move towards an implementation of the Kuramoto oscillator network model with time delays and stochastic forcing. My first attempt here is just a linear SDDE,
using StochasticDelayDiffEq
begin
function f(du,u,h,d,t)
aff = zero(u)
for j=1:length(u)
for i=1:length(u)
aff[i] += h(d, t - d[i,j]; idxs=j)
end
end
du .= -u + 0.1*aff
end
function g(du,u,h,d,t)
du .= 0.1
end
n = 100
h(d,t;idxs=nothing) = idxs==nothing ? (ones(n) .+ t) : (ones(n)[idxs] .+ t);
tspan = (0.,10.)
ic = randn(n)
d = rand(n,n)
end
prob = SDDEProblem(f, g, ic, h, tspan, d; constant_lags = d);
sol = solve(prob,EM(),dt=0.01,progress=true,progress_steps=10)
trying this on Julia v1.5 seems to spend several minutes without producing any output. With n=10, the solution is done in a few seconds. This suggests (to me) a super-linear scaling of some delay related data structures internally, and I am curious if I'm not using this the wrong way?
I'm not too surprised that it is quite a bit slower since the number of delays scales quadratically with n. Since you specify constant_lags (BTW I've never specified them as matrix but great if it works :smile:), the solver tracks the propagated discontinuities explicitly and reduces the time steps so it hits them (more or less) exactly up until a certain discontinuity order that depends on the chosen solver. You could try to not specify constant_lags - in this case the solver will not track the propagated discontinuities and not try to hit them exactly but will only decrease the step size if the error heuristic becomes too large (at least in DelayDiffEq, not completely sure ad hoc about the implementation in StochasticDelayDiffEq). This might speed up the computations since probably less steps are required at the expense of an increased error.
Thanks for the helpful details! I don't need the discontinuity propagation for now, I was just following the example in the unit tests. I'll drop that and see what happens.
That reminds me though, @YingboMa was mentioning that in some of Enright or Harier's work that discontinuities were not check for if steps were accepted, which is something we could look into adding.
in this case the solver will not track the propagated discontinuities and not try to hit them exactly but will only decrease the step size if the error heuristic becomes too large (at least in DelayDiffEq, not completely sure ad hoc about the implementation in StochasticDelayDiffEq)
The error checking adaptivity will do similar here. But note SDE integrators are such a lower order than discontinuities shouldn't have to propagate far, so we can definitely do better in the detection process.
That reminds me though, @YingboMa was mentioning that in some of Enright or Harier's work that discontinuities were not check for if steps were accepted, which is something we could look into adding.
Dependent delays are already handled in this way (similar to RADAR5): https://github.com/SciML/DelayDiffEq.jl/pull/121
Thanks for the tips & info. For n=10, dropping constant_lags is 10x faster, for n=20 it's 100x. For my problem size of n=164 this makes it work, so I'm going to close this issue (since this was a feature not a bug in any case)
Yeah, 164... oof. Even with a 1st order method that's placing 26896 discontinuities. I see that is the issue: with dependent delays we don't do the extra computation if we accept a step, but with constant lags we step exactly to the computation so it's "normally" not a huge issue, but then in this case you are forced to his 26896 steps in the integration which might be making many that could be stepped over and still hit tolerance, so we probably could ignore some. This is something to think about.
Though n=10 should be an issue: if the normal solve was like 100 steps, you shouldn't be able to see the 10 discontinuity handling points, which makes me think something is going wrong. It might be propagating to higher order when 1st order is all that's needed?
In any case the above example exaggerates the problem, since rand(n,n) can generate that many unique values. In the domain for this set of equations, these delays are bounded on an interval of 0 to 256 ms and an error of a few ms in the delay is not problematic. We therefore in practice round delays to the nearest integer number of time steps (using EM or Heun) so that the solver hits every delay exactly without the need to interpolate.
If I try this in the above code by constraining delays to integer multiples of step size,
d = rand(n,n);
dt = 0.01
d = floor.(d .^ 4 ./ dt) .* dt;
this results in 100 unique delay values (instead of 164^2), each on the solver time grid (since this is EM w/ dt=0.01). This is about 15% faster (comparing last of 4 runs to avoid jit overhead). Setting save_everystep=false reduces that by 45%. Switching methods to SRIW1 reduces by half again. This is fastest enough for my purposes.
Great! It would be good to establish some benchmark problems for this repo, since it's new enough that we don't have a set to optimize against yet. So if that ever becomes a public model, it would be nice if you could donate it 😄 .
I'd be happy to contribute the model, though the analytic result (not on the solution itself but the critical parameter value for phase transition) I have is only for non delayed case. I'm checking to see if that can be relaxed. I am also trying to make work precision diagrams for the different solvers, with respect to this analytic result, which I could share as a notebook or something.
This may not be the format you had in mind, but this is what's in my editor right now, and I figured I'd copy-paste before I forget about this issue:
using StochasticDelayDiffEq, Statistics, DelimitedFiles, ZipFile
function get_tvb_connectome()
fname = "conn76.zip"
if ~isfile(fname)
url = "https://github.com/the-virtual-brain/tvb-data/raw/master/tvb_data/connectivity/connectivity_76.zip"
download(url, fname)
end
zf = ZipFile.Reader(fname)
l = readdlm(zf.files[6])
w = readdlm(zf.files[7])
w, l
end
struct kp
G::Float64
σ::Float64
w::Array{Float64,2}
l::Array{Float64,2}
end
function kf(du,u,h,p,t)
G = p.G/length(u)
du .= 2*2*π/1000
for j=1:length(u)
for i=1:length(u)
if i!=j
uⱼ = h(p, t - p.l[i,j]; idxs=j)
du[i] += G * p.w[i,j] * sin(uⱼ - u[i])
end
end
end
end
function kg(du,u,h,p,t)
du .= sqrt(2*p.σ)
end
w, l = get_tvb_connectome()
nn = size(w,1)
ic = rand(nn) .* 2 .* π;
kh(d,t;idxs=nothing) = idxs==nothing ? ic : ic[idxs];
d
prob = SDDEProblem(kf, kg, ic, kh, (0.,500.), kp(0.1, 0.01, w, l));
@time sol = solve(prob, SRIW1(),dt=1.0, reltol=1e-3, abstol=1e-3);
The Kuramoto oscillatory model is the simplest one we might use, others are more involved, but they are are all fairly similar in structure.
We intend to do work precision diagrams for a few solvers on models like this (which are frequently memory bound because of the time delays), so if/when that's available, I'll also post a link here.
That's perfect, thanks.
I'm making a little progress toward the work precision diagram, but I noticed that increasing abstol up to 1e-5 increase runtime, then to 1e-6, run time is dropping, @time shows allocations follow runtime so I guess the adaptivity is taking fewer steps. It is surprising to me, but I wondered if this could be related to the delays?
There's quite a few cases where this happens, even in ODEs. It's rare, but sometimes solving a problem more accurately makes it better behaved since being off the manifold could induce things like stiffness, it's not exactly monotonic w.r.t. tolerance. that said, it should be "mostly monotonic" with respect to tolerance.