DelayEmbeddings.jl icon indicating copy to clipboard operation
DelayEmbeddings.jl copied to clipboard

Mutual information between two timeseries?

Open Datseris opened this issue 5 years ago • 9 comments

Hi @heliosdrm ,I am wondering... We have this great Mutual Information function. At the moment I have two timeseries x, y and I want to calculate the mutual information between x and y delayed by τ.

At the moment I am using "InformationMeasures", but I am not really happy with that package. It has bad syntax, and doesn't even have a project toml... I want to get rid of it and add such a method here. I was thinking of copy pasting their code here and making it drastically smaller and with simpler syntax, but first I should ask you: is it possible to add mutual information here between two timesries, given the method you have written?

(to clarify: I don't have a good idea how to get mutual info from two variables, besides doing all the histograms from scratch. that's why I use a package)

Datseris avatar Apr 09 '20 10:04 Datseris

Hi @heliosdrm , have you seen this message? If you don't have time to do anything, would be still good to just say that, so I know that you have seen this and then try to do things on my own.

Datseris avatar Jun 09 '20 09:06 Datseris

@Datseris In CausalityToolsBase, I define RectangularBinning. It basically provides different pre-defined ways of partitioning the state space. I use this for the binning based transfer entropy estimators.

The current implementation of the mutualinformation function here is based on binning. It would be nice if mutualinformation here could dispatch on different estimators, so that the estimator types contain parameters necessary for the computations.

I am also currently working on symbolic estimators for transfer entropy over in CausalityTools.jl, which uses different variants of permutation entropy. It would be straight-forward to customize these estimators for delayed MI. If you're interested, I can attempt a PR.

What I am imagining is something like this:

mutualinformation(x, y, lag::Int, method::RectangularBinning)  # customized rectangular binning
mutualinformation(x, y, lag::Int, method::RecursiveBisection)  # recursive bisection binning
mutualinformation(x, y, lag::Int, method::KraskovKNN)  # the "old" mi
mutualinformation(x, y, lag::Int, method::PermutationEntropy) 
mutualinformation(x, y, lag::Int, method::WeightedPermutationEntropy)

If desired, I can start a PR with an api and contribute the permutation-based methods, and someone else can take care of the binning-based methods?

kahaaga avatar Jun 09 '20 10:06 kahaaga

yeah I've been thinking about that and i think it is worth combining effforts.

The method here uses binning, correct, but it is an optimized version because it can only do the self mutual information with time delay. But I was wondering that in CausalityTools.jl you would have a mutual information calculation anyway, right? So we could potentially use this version for the 2 timeseries version. Is it in TransferEntropy.jl ?

Datseris avatar Jun 09 '20 11:06 Datseris

entropy(x, method::EntropyEstimator)

Datseris avatar Jun 16 '20 14:06 Datseris

@kahaaga I think it is worth also exposing the interface

probabilities(x, method::EntropyEstimator)  # customized rectangular binning

that simply calculates the propabilities p_k which are the passed to the generalized entropy formula.

Datseris avatar Jun 17 '20 11:06 Datseris

and for clarity perhaps we should be using est instead of method (for estimator) and maybe change the abstract type to ProbabilitiesEstimator ?

Datseris avatar Jun 17 '20 11:06 Datseris

To get the (unordered) probabilities for my marginal x, I call

probabilities(x, est::ProbabilitiesEstimator)

I am understanding you correctly?

kahaaga avatar Jun 17 '20 11:06 kahaaga

yeah, but for me x is an ordered set, or an ordered timeseries, not a marginal (but maybe we say the same thing). The probabilities themselves are indeed typically un-ordered, or their order doesn't matter.

Datseris avatar Jun 17 '20 11:06 Datseris

I think we're speaking of the same thing.

x is typically some Dataset (which may be the entire multidimensional dataset, or some subset (marginal) of the entire dataset) (which is ordered). The probabilities, however, are un-ordered.

kahaaga avatar Jun 17 '20 11:06 kahaaga