Associations.jl
Associations.jl copied to clipboard
WIP: Compression complexity causality
What is this PR?
This PR implements the effort-to-compress algorithm that underlies the Compression-Complexity Causality (CCC) in #161.
It also implements the CCC algorithm in #161.
TODO
- [x] Uni- and multivariate effort-to-compress.
- [x] Joint effort-to-compress.
- [ ] CCC univariate
- [ ] CCC multivariate
Interface for compression complexity
I've decided on a dispatch-on-algorithm approach to allow a cleaner interface in the case that future compression complexity measures (e.g. Lempel-Siv complexity) are added.
The syntax is compression_complexity(x, algorithm::CompressionComplexityAlgorithm)
.
Input time series x
(and y
, for joint estimation) must always be pre-symbolized.
Effort-to-compress (ETC)
I've done an implementation of the ETC algorithm from scratch, based on the Nagaraj (2013; for single time series) and Kathpalia & Nagaraj (2019; for joint estimation). ETC is the only compression complexity measure included in this PR.
The syntax is:
-
Regular ETC:
compression_complexity(x, EffortToCompress())
. Computed on a single univariate (AbstractVector{Integer}
) or multivariate (AbstractDataset{D, Integer}
) pre-symbolized time seriesx
. -
Joint ETC:
compression_complexity(x, y, EffortToCompress())
. Also computed on pre-binned time seriesx
andy
.
See the tests for some examples.
Performance
My initial implementation of ETC is pretty naive - it uses dictionaries for histogram estimation. The algorithm can probably be optimised substantially by using pre-allocated arrays and clever indexing during the recursion step. However, at this stage I'm keeping it simple, just to have a working algorithm. I haven't looked at the MATLAB/Python implementations yet, so not sure what the speed-ups could be. This will have to be tested at some point.
Testing
Tests are based on examples from the papers linked in #161, and by comparison to results from the MATLAB scripts provided in the references given in the original papers. Additional tests for obvious edge cases and examples have also been added to the test suite.
All tests pass, so I'm fairly sure the implementation is correct.
Below is a reproduction of figure 3 in Nagaraj (2013), which compares the (approximate) Lyapunov exponent and compression complexity (ETC) of 200-pt long logistic map time series with varying bifurcation parameters. For ETC, the time series was symbolized to a binary time series by splitting above and below x = 0.5
. Lyapunov exponents are estimated directly on the time series.

Codecov Report
Merging #162 (c0fd2b9) into master (f1bc2b7) will increase coverage by
21.85%
. The diff coverage is93.63%
.
:exclamation: Current head c0fd2b9 differs from pull request most recent head 9418012. Consider uploading reports for the commit 9418012 to get more accurate results
@@ Coverage Diff @@
## master #162 +/- ##
===========================================
+ Coverage 53.75% 75.60% +21.85%
===========================================
Files 154 48 -106
Lines 4757 1025 -3732
===========================================
- Hits 2557 775 -1782
+ Misses 2200 250 -1950
Impacted Files | Coverage Δ | |
---|---|---|
src/CausalityTools.jl | 0.00% <ø> (ø) |
|
...sures/dynamical_complexity/dynamical_complexity.jl | 58.62% <58.62%> (ø) |
|
...yCausality/ccc/compression_complexity_causality.jl | 88.23% <88.23%> (ø) |
|
...on_complexity/effort_to_compress/etc_algorithms.jl | 100.00% <100.00%> (ø) |
|
...ression_complexity/effort_to_compress/etc_joint.jl | 100.00% <100.00%> (ø) |
|
..._complexity/effort_to_compress/etc_multivariate.jl | 100.00% <100.00%> (ø) |
|
...on_complexity/effort_to_compress/etc_univariate.jl | 100.00% <100.00%> (ø) |
|
...ression_complexity/effort_to_compress/etc_utils.jl | 100.00% <100.00%> (ø) |
|
...ventionalComplexityCausality/ccc/ccc_estimators.jl | 100.00% <100.00%> (ø) |
|
src/utils/sliding_window.jl | 100.00% <100.00%> (ø) |
... and 190 files with indirect coverage changes
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
perhaps @AditiKathpalia is interested to review this PR?
(I'm super swamped at the moment, but will return here in a couple of months)
perhaps @AditiKathpalia is interested to review this PR?
That would be awesome!
If you're interested @AditiKathpalia, let me know, and I'll tag you here once it's ready for review.
Yes, sure, I will review it. Thanks for implementing this.
Hey @AditiKathpalia!
I'm trying to implement the full CCC algorithm, but when trying to compare my results with the implementation in the MATLAB toolbox from the supplement of the paper, I'm not able to run the examples in the code.
The documentation for the conditional_CCC
function references a Main.m
file, which is missing, so I can't run the given examples.
I also tried to follow the instructions in the function documentation and run an example myself:
N = 1000
INFO_CCC = struct('L',150,'w',15,'delta',80,'B',2,'N',N)
[y1, y2] = coupled_AR(N);
[CCC_value, k] = conditional_CCC([y1; y2], 1, 2, INFO_CCC)
but get the error
Index in position 2 exceeds array bounds. Index must not exceed 1.
Error in conditional_CCC (line 61)
ETC_ini(k)=ETC(data_reduced(:,i:i+LEN_past-1),Num_bins);
I could spend more time deciphering the MATLAB code and figuring out what's wrong (or what I'm misunderstanding in the documentation string). However, given that there are files missing from the toolbox, I think it would be much faster if you provided an up-to-date, working example that I can use as a basis for comparison with my implementation.
Do you have an example of a simple CCC(x -> y)
computation between two univariate time series x
and y
, using the code in MATLAB toolbox (as dowloaded from the paper)?
You can just post code snippet in a comment here, and I'll figure out the rest.
EDIT: By the way, I'm running MATLAB version 2021b
- if that should matter.