Associations.jl icon indicating copy to clipboard operation
Associations.jl copied to clipboard

WIP: Compression complexity causality

Open kahaaga opened this issue 2 years ago • 6 comments

What is this PR?

This PR implements the effort-to-compress algorithm that underlies the Compression-Complexity Causality (CCC) in #161.

It also implements the CCC algorithm in #161.

TODO

  • [x] Uni- and multivariate effort-to-compress.
  • [x] Joint effort-to-compress.
  • [ ] CCC univariate
  • [ ] CCC multivariate

Interface for compression complexity

I've decided on a dispatch-on-algorithm approach to allow a cleaner interface in the case that future compression complexity measures (e.g. Lempel-Siv complexity) are added.

The syntax is compression_complexity(x, algorithm::CompressionComplexityAlgorithm).

Input time series x (and y, for joint estimation) must always be pre-symbolized.

Effort-to-compress (ETC)

I've done an implementation of the ETC algorithm from scratch, based on the Nagaraj (2013; for single time series) and Kathpalia & Nagaraj (2019; for joint estimation). ETC is the only compression complexity measure included in this PR.

The syntax is:

  • Regular ETC: compression_complexity(x, EffortToCompress()). Computed on a single univariate (AbstractVector{Integer}) or multivariate (AbstractDataset{D, Integer}) pre-symbolized time series x.
  • Joint ETC: compression_complexity(x, y, EffortToCompress()). Also computed on pre-binned time series x and y.

See the tests for some examples.

Performance

My initial implementation of ETC is pretty naive - it uses dictionaries for histogram estimation. The algorithm can probably be optimised substantially by using pre-allocated arrays and clever indexing during the recursion step. However, at this stage I'm keeping it simple, just to have a working algorithm. I haven't looked at the MATLAB/Python implementations yet, so not sure what the speed-ups could be. This will have to be tested at some point.

Testing

Tests are based on examples from the papers linked in #161, and by comparison to results from the MATLAB scripts provided in the references given in the original papers. Additional tests for obvious edge cases and examples have also been added to the test suite.

All tests pass, so I'm fairly sure the implementation is correct.

Below is a reproduction of figure 3 in Nagaraj (2013), which compares the (approximate) Lyapunov exponent and compression complexity (ETC) of 200-pt long logistic map time series with varying bifurcation parameters. For ETC, the time series was symbolized to a binary time series by splitting above and below x = 0.5. Lyapunov exponents are estimated directly on the time series.

Screenshot 2022-08-20 at 02 17 28

kahaaga avatar Aug 19 '22 08:08 kahaaga

Codecov Report

Merging #162 (c0fd2b9) into master (f1bc2b7) will increase coverage by 21.85%. The diff coverage is 93.63%.

:exclamation: Current head c0fd2b9 differs from pull request most recent head 9418012. Consider uploading reports for the commit 9418012 to get more accurate results

@@             Coverage Diff             @@
##           master     #162       +/-   ##
===========================================
+ Coverage   53.75%   75.60%   +21.85%     
===========================================
  Files         154       48      -106     
  Lines        4757     1025     -3732     
===========================================
- Hits         2557      775     -1782     
+ Misses       2200      250     -1950     
Impacted Files Coverage Δ
src/CausalityTools.jl 0.00% <ø> (ø)
...sures/dynamical_complexity/dynamical_complexity.jl 58.62% <58.62%> (ø)
...yCausality/ccc/compression_complexity_causality.jl 88.23% <88.23%> (ø)
...on_complexity/effort_to_compress/etc_algorithms.jl 100.00% <100.00%> (ø)
...ression_complexity/effort_to_compress/etc_joint.jl 100.00% <100.00%> (ø)
..._complexity/effort_to_compress/etc_multivariate.jl 100.00% <100.00%> (ø)
...on_complexity/effort_to_compress/etc_univariate.jl 100.00% <100.00%> (ø)
...ression_complexity/effort_to_compress/etc_utils.jl 100.00% <100.00%> (ø)
...ventionalComplexityCausality/ccc/ccc_estimators.jl 100.00% <100.00%> (ø)
src/utils/sliding_window.jl 100.00% <100.00%> (ø)

... and 190 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov-commenter avatar Aug 19 '22 09:08 codecov-commenter

perhaps @AditiKathpalia is interested to review this PR?

Datseris avatar Aug 19 '22 11:08 Datseris

(I'm super swamped at the moment, but will return here in a couple of months)

Datseris avatar Aug 19 '22 11:08 Datseris

perhaps @AditiKathpalia is interested to review this PR?

That would be awesome!

If you're interested @AditiKathpalia, let me know, and I'll tag you here once it's ready for review.

kahaaga avatar Aug 19 '22 11:08 kahaaga

Yes, sure, I will review it. Thanks for implementing this.

AditiKathpalia avatar Aug 21 '22 20:08 AditiKathpalia

Hey @AditiKathpalia!

I'm trying to implement the full CCC algorithm, but when trying to compare my results with the implementation in the MATLAB toolbox from the supplement of the paper, I'm not able to run the examples in the code.

The documentation for the conditional_CCC function references a Main.m file, which is missing, so I can't run the given examples.

I also tried to follow the instructions in the function documentation and run an example myself:

N = 1000
INFO_CCC = struct('L',150,'w',15,'delta',80,'B',2,'N',N)
[y1, y2] = coupled_AR(N);
[CCC_value, k] = conditional_CCC([y1; y2], 1, 2, INFO_CCC)

but get the error

Index in position 2 exceeds array bounds. Index must not exceed 1.

Error in conditional_CCC (line 61)
        ETC_ini(k)=ETC(data_reduced(:,i:i+LEN_past-1),Num_bins);

I could spend more time deciphering the MATLAB code and figuring out what's wrong (or what I'm misunderstanding in the documentation string). However, given that there are files missing from the toolbox, I think it would be much faster if you provided an up-to-date, working example that I can use as a basis for comparison with my implementation.

Do you have an example of a simple CCC(x -> y) computation between two univariate time series x and y, using the code in MATLAB toolbox (as dowloaded from the paper)?

You can just post code snippet in a comment here, and I'll figure out the rest.

EDIT: By the way, I'm running MATLAB version 2021b - if that should matter.

kahaaga avatar Aug 23 '22 09:08 kahaaga