ComplexityMeasures.jl
ComplexityMeasures.jl copied to clipboard
Discussion on invariant measure estimation
Let's say I'm interested in getting the transfer matrix and the invariant measure in case of a time series.
using DynamicalSystems
henon_rule(x, p, n) = SVector{2}(1.0 - p[1]*x[1]^2 + x[2], p[2]*x[1])
henon = DeterministicIteratedMap(henon_rule, zeros(2), [1.4, 0.3])
orbit, t = trajectory(henon, 20_000; Ttr = 500)
As the docs say we can get an estimate for the invariant measure by using probabilities
:
using ComplexityMeasures
grid_size=20
grid_along_1d = range(-2,2;length=grid_size)
grids = (grid_along_1d,grid_along_1d)
binning = FixedRectangularBinning(grids)
p_est = probabilities(RelativeAmount(),ValueBinning(binning),orbit) #estimate by counting visitations
According to the docs, using probabilities
with TransferOperator
gives the invariant measure (the stationary distribution over the states) instead of just an estimate from counting:
ρ_to = probabilities(TransferOperator(binning),orbit)
If I understand correctly, another way of doing this is with invariantmeasure
:
iv = invariantmeasure(orbit,binning)
ρ_to
and iv.ρ
are not exactly the same, although very similar:
iv.ρ[1] == ρ_to[1] #false
iv.ρ[1] ≈ ρ_to[1] #true
Is this difference because of the random initialization of the distribution?
If yes, the docs are not really clear on this issue (at least for me).
The TransferOperator
docs say "In practice, the invariant measure ρ
is computed using invariantmeasure
, which also approximates the transfer matrix. The invariant distribution is initialized as a length-N random distribution which is then applied to P".
But then the invariantmeasure
docs say that the iterative process starts from the estimated distribution from the counting, not random:
"Estimate an invariant measure over the points in x based on binning the data into rectangular boxes dictated by the binning, then approximate the transfer (Perron-Frobenius) operator over the bins. "
Wouldn't it be more efficient if the invariant distribution was initialized as p_est
, not a random vector?
Another curiosity: is there a way to get the P
transfer matrix without calling invariantmeasure
first?
Let me know if I misunderstood anything.
cc @kahaaga he is the expert on the transfer operator!
This might be a separate issue, but since it is related, I'll mention it here.
Why one get different number of states in the outcome space when using probabilities
vs. invariantmeasure
?
using DynamicalSystems
henon_rule(x, p, n) = SVector{2}(1.0 - p[1]*x[1]^2 + x[2], p[2]*x[1])
henon = DeterministicIteratedMap(henon_rule, zeros(2), [1.4, 0.3])
orbit, t = trajectory(henon, 20_000; Ttr = 500)
using ComplexityMeasures
grid_size = 20
binning = RectangularBinning(grid_size)
p_cm = probabilities(ValueBinning(binning),orbit) #100 states
which gives a 100 states.
iv = invariantmeasure(orbit,binning)
P_cm,symbols = transfermatrix(iv) #101 states??
which in contrast gives 101 states, despite using the same binning
.
Hey @rusandris!
Is there a way to get the P transfer matrix without calling
invariantmeasure
first?
Yes. The transfer matrix is estimated first, then the stationary distribution is estimated from the transfer matrix. Given your example, you can do
# This will be a `TransferOperatorApproximationRectangular`, which is just a simple struct that
# stores relevant information about the transfer operator and its binning.
julia> to = ComplexityMeasures.transferoperator(orbit, binning);
julia> to.transfermatrix
32×32 SparseArrays.SparseMatrixCSC{Float64, Int32} with 92 stored entries:
⎡⠫⠣⡀⠀⡂⠠⠀⠀⠢⠀⠀⠀⠀⠀⠀⠀⎤
⎢⣀⢀⠈⠢⠀⠀⠂⠈⠀⠊⠀⠂⠠⠄⠀⠠⎥
⎢⠠⠠⡈⠀⠈⠢⠈⠈⠈⠀⠐⠈⠀⠀⠀⠀⎥
⎢⠀⠀⠂⠈⠢⡀⠠⠢⠀⠀⠐⡀⢄⡀⠁⠀⎥
⎢⠁⠀⠂⡠⠂⠀⠈⠀⠈⠀⡀⠈⠀⠀⠀⡀⎥
⎢⠀⠀⠀⠠⠀⠐⠑⡀⠐⠀⢀⠀⠀⠈⠌⠀⎥
⎢⠒⠀⠀⢰⠀⡁⠀⠀⠀⠀⠈⠀⢀⠀⠄⠀⎥
⎣⣈⠈⠀⠀⠀⠀⠄⠀⠀⠀⠀⠄⠐⠆⠀⠀⎦
The (i,j)
th entry of this matrix is the transition probability from state (bin) i
to bin j
, and you can see that the number of bins is the same as the dimension of the transfermatrix:
julia> to.bins
32-element Vector{Int64}:
203
202
204
201
187
214
141
184
186
216
⋮
162
217
197
164
163
160
185
159
140
183
To get the invariant measure, now simply do:
iv = invariantmeasure(to)
According to the docs, using probabilities with TransferOperator gives the invariant measure (the stationary distribution over the states) instead of just an estimate from counting: If I understand correctly, another way of doing this is with invariantmeasure:
The estimation procedure for the stationary distribution over the states is:
- Bin the data
- Estimate an
N
-by-N
-sized transfer matrixT
usingComplexityMeasures.transferoperator
(as shown above), whereN
is the number of bins. - Apply
T
repeatedly to any length-N
distributionp
until convergence. By convergence, I mean "p
doesn't change beyond some semi-arbitrarily determined threshold". The point of this is to estimate the left-eigenvector ofT
. - The resulting distribution is the invariant measure, or the stationary distribution over the states.
The reason why you're seeing slightly different results in your example is that you're essentially estimating the invariant distribution twice (two separate calls to the same underlying function). Since the distribution p
is randomly initialized, you're getting similar but slightly different results. It should be possible to specify an rng
argument to TransferOperator
and the underlying functions - then you'd get identical results for both calls. I'll open a separate issue for this.
But then the invariantmeasure docs say that the iterative process starts from the estimated distribution from the counting, not random: "Estimate an invariant measure over the points in x based on binning the data into rectangular boxes dictated by the binning, then approximate the transfer (Perron-Frobenius) operator over the bins. "
I'm not quite sure if I understand this comment. Perhaps it would be clearer if the sentence was re-ordered, e.g. "Bin the data into rectangular boxes according to binning
, approximate the transfer (Perron-Frobenius) operator over the bins (i.e. the transfer matrix T
), then find the left-eigenvector of this distribution by repeatedly applying M
to a randomly initialized distribution p
.
Wouldn't it be more efficient if the invariant distribution was initialized as
p_est
, not a random vector?
I suppose it is reasonable to think convergence could be achieved faster if p_est
was used. I'm not sure what the runtime difference would be in practice, but it could be worth trying out.
This might be a separate issue, but since it is related, I'll mention it here.
Why one get different number of states in the outcome space when using
probabilities
vs.invariantmeasure
?using DynamicalSystems henon_rule(x, p, n) = SVector{2}(1.0 - p[1]*x[1]^2 + x[2], p[2]*x[1]) henon = DeterministicIteratedMap(henon_rule, zeros(2), [1.4, 0.3]) orbit, t = trajectory(henon, 20_000; Ttr = 500) using ComplexityMeasures grid_size = 20 binning = RectangularBinning(grid_size) p_cm = probabilities(ValueBinning(binning),orbit) #100 states
which gives a 100 states.
iv = invariantmeasure(orbit,binning) P_cm,symbols = transfermatrix(iv) #101 states??
which in contrast gives 101 states, despite using the same
binning
.
Nice catch! This is related to #328. You're getting an extra bin which is symbolized as -1
(i.e. the point is outside the binning; see third last element below).
to = ComplexityMeasures.transferoperator(orbit, binning);
julia> to.bins
101-element Vector{Int64}:
275
296
314
277
332
240
382
28
155
295
⋮
260
116
279
91
217
111
385
-1
328
75
I'm moving this into a separate issue.
@rusandris Hey again. On main
, you can now specify the rng
argument to get reproducible results.
using Random
D = StateSpaceSet(rand(MersenneTwister(1234), 100, 2))
# Resetting rng will lead to identical results.
p1 = probabilities(TransferOperator(b; rng = MersenneTwister(1234)), D)
p2 = probabilities(TransferOperator(b; rng = MersenneTwister(1234)), D)
@test all(p1 .== p2) # true, distributions are identical, because we're starting with the same random vector for both p1 and p2 when estimating the invariant measure
# Not resetting rng will lead to non-identical results.
rng = MersenneTwister(1234)
p1 = probabilities(TransferOperator(b; rng), D)
p2 = probabilities(TransferOperator(b; rng), D)
@test !all(p1 .== p2) # true, distributions are not quite equal because we're using different random vectors for p1 and p2 when estimating the invariant measure
@test p1[1] ≈ p2[1] # But we should be close
I suppose it is reasonable to think convergence could be achieved faster if p_est was used. I'm not sure what the runtime difference would be in practice, but it could be worth trying out.
Absolutely. It's not that obvious which method would be faster, and it might also depend on the length of the time series, resolution of the binning etc.