ComplexityMeasures.jl
ComplexityMeasures.jl copied to clipboard
Walkthrough entropy
What is this PR?
This PR implement the walkthrough entropy (Stoop et al, 2021) for a symbol sequence x
.
Walkthrough entropy is the first step in implementing excess entropy (#69), which is just a normalized and averaged version of walkthrough entropy, but is a useful method in itself - hence this PR.
Excess entropy will be pursued in another PR. The reason for this is that I'm having some issues understanding the implementation of the normalization step (commented in the _walkthrough_entropy
function docstring). I will investigate this further and submit a PR when ready (if you @Datseris or someone else has any input, feel free to comment).
Interface
-
walkthrough_entropy(x, n)
computes the walkthrough entropy forx
at position(s)n
, where1 <= n <= length(x)
.
Internal changes
- Implemented a generic
EntropyGenerator
struct with a correspondingentropygenerator(x, method, [, rng])
(like we do in TimeseriesSurrogates). Why? Walkthrough entropy is a function of the positionn
, but if not having a generator, we'd need to do initial calculations (histogram estimation) multiple times, which grows linearly withlength(x)
. - For completeness, other methods should implement
entropygenerator(x, method, [, rng])
too, but I haven't done so yet, before getting some feedback on this approach. - Added another histogram method
vec_countmap(x)
which returns both the unique elements ofx
and their frequencies. The element type of the frequencies can be customized (defaults toBigInt
, because that is needed for binomial calculations with largen
/N
for the walkthrough entropy to avoid overflow).
(Currently) Unused files
-
walkthrough_prob.jl
is currently unused, but is implemented for completeness for reproducing Stoop et al. (2021). it is conceivable that these methods become useful in some future algorithm. Note: the factorials quickly blow up. Should only be used for experimentation.
Potential future improvements
- Use specialized methods for
BigInt
calculations. These are quite slow and allocates a lot at the moment.
Testing
The original paper doesn't provide any concrete examples to test on, so tests are generic. However, in the documentation example, I successfully reproduce the basic examples in Figure 1 from Stoop et al. (2021).
References
- Stoop, R. L., Stoop, N., Kanders, K., & Stoop, R. (2021). Excess entropies suggest the physiology of neurons to be primed for higher-level computation. Physical Review Letters, 127(14), 148101.
Codecov Report
Merging #70 (dd10654) into main (3c95c69) will increase coverage by
1.81%
. The diff coverage is91.83%
.
@@ Coverage Diff @@
## main #70 +/- ##
==========================================
+ Coverage 79.32% 81.13% +1.81%
==========================================
Files 21 25 +4
Lines 624 721 +97
==========================================
+ Hits 495 585 +90
- Misses 129 136 +7
Impacted Files | Coverage Δ | |
---|---|---|
src/walkthrough/walkthrough_entropy.jl | 87.75% <87.75%> (ø) |
|
src/histogram_estimation.jl | 95.91% <92.00%> (-0.09%) |
:arrow_down: |
src/api.jl | 100.00% <100.00%> (ø) |
|
src/walkthrough/walkthrough.jl | 100.00% <100.00%> (ø) |
|
src/walkthrough/walkthrough_prob.jl | 100.00% <100.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
I'll need to review this in detail before we merge, but a detail review may take a month or two...
I'll need to review this in detail before we merge, but a detail review may take a month or two...
No pressure! I'm just tagging you for review here so you're aware of the PR.
I will probably be able to figure out the missing normalization step in the meanwhile, so that the excess entropy estimator is also included by that time.
After the package redesign, it is clear that the walkthrough entropy isn't really an entropy - but is is an information measure of some sort according to our definition, since it is a function of probabilities. It is possible to create an OutcomeSpace
which follows the reasoning in the paper and to define counts over the outcomes, and this outcome space is worth having. However, it isn't clear to me how these counts would relate to the probabilities that they use in the paper to define their "walkthrough entropy". I have to dig a bit deeper.
This PR will be modified accordingly.