ComplexityMeasures.jl icon indicating copy to clipboard operation
ComplexityMeasures.jl copied to clipboard

Optimize transfer operator estimation

Open rusandris opened this issue 6 months ago • 5 comments

We were looking into reducing the overlap between our package StateTransitionNetworks.jl and ComplexityMeasures.

Since the transition matrix or transfer (Perron-Frobenius) operator plays a central role in our package, we also wrote code for it and we noticed that the approach here used in transferoperator can be made more minimal and more performant.

This measures the time to get the transition matrix from a fairly long ($10^7$) orbit of the Hénon map using the two ways: transferoperator from ComplexityMeasures and calculate_transition_matrix from StateTransitionNetworks. 2D binning is used to discretize in both.

using DynamicalSystems
using BenchmarkTools

#Henon system orbit
henon_rule(x, p, n) = SVector{2}(1.0 - p[1]*x[1]^2 + x[2], p[2]*x[1])
henon = DeterministicIteratedMap(henon_rule, zeros(2), [1.4, 0.3])
orbit, t = trajectory(henon, 10^7; Ttr = 10^4)
grid_size = 10 #10x10 grid binning
#grid_size = 100 #100x100 grid binning

#-----------------STN way-------------------
using StateTransitionNetworks

@btime begin
symbolic_ts = timeseries_to_grid(orbit,grid_size) #get the symbolic time series
P_stn,Q_stn,x_stn = calculate_transition_matrix(symbolic_ts) #calculate transition matrix
P_stn 
end
#for grid_size = 10 ->  1.480 s (20000065 allocations: 1.64 GiB)
#for grid_size = 100 -> 1.602 s (20000083 allocations: 1.64 GiB)
#----------------CM way----------------------

using ComplexityMeasures

@btime begin
binning = RectangularBinning(grid_size)
to = ComplexityMeasures.transferoperator(orbit, binning);
P_cm = to.transfermatrix
end
#for grid_size = 10 ->  2.098 s (1545 allocations: 828.16 MiB)
#for grid_size = 100 -> 3.333 s (19205 allocations: 830.85 MiB)

#are these the same?
println(P_stn.nzval[1:3]) #[0.5980376275657787, 0.7666443626630981, 0.6996943940406837]
println(P_cm.nzval[1:3]) #[5.894488653109343e-6, 0.5980376275657788, 0.766644362663098]
#yes except one additional value in P_cm which is almost zero

all(isapprox.(P_stn.nzval, P_cm.nzval[2:end];atol=1e-3)) #true

I think one of the reasons for these differences in runtime and scaling is that the visitors list used in transferoperator isn't actually necessary if we just want the transition matrix. Right now a list is created (that contains all visitors of each bin) and transition probabilities are computed by looping through the visitors list.

This is what I mean by making more minimal. If visitors isn't actually used anywhere else, it can be skipped and the transition probabilities can be calculated by looping through the symbolic orbit (series of outcomes, pts) once.

I've thrown together an initial PR just to show the difference: #423 (very rough, but some features can be added back later like boundary conditions, stochasticity checking etc. )

To illustrate this point further, I timed (using simple @time inside transferoperator) how much is actually taken up by creating visitors and the calculation of probabilities:

#---------------------with visitors (original)-------------------
@time ComplexityMeasures.transferoperator(pts, binning);
visitors list time:
  0.787510 seconds 
loop time (compute probabilities):
  0.257154 seconds 

Total  1.747746 seconds 

#---------------------without visitors (PR)-------------------
@time ComplexityMeasures.transferoperator(pts, binning);
loop time compute probabilities): 
  0.157713 seconds 

Total: 0.730922 seconds 

Let me know if this is worth the effort. There was a discussion about this on Slack with @Datseris and @sbulcsu but we decided to move it here. Also see #424 about generalizing the transfer operator because the design here might also change if we decide to introduce a transferoperator that's not tied to binning.

rusandris avatar Aug 18 '24 09:08 rusandris