ComplexityMeasures.jl icon indicating copy to clipboard operation
ComplexityMeasures.jl copied to clipboard

Walkthrough entropy

Open kahaaga opened this issue 2 years ago • 4 comments

What is this PR?

This PR implement the walkthrough entropy (Stoop et al, 2021) for a symbol sequence x.

Walkthrough entropy is the first step in implementing excess entropy (#69), which is just a normalized and averaged version of walkthrough entropy, but is a useful method in itself - hence this PR.

Excess entropy will be pursued in another PR. The reason for this is that I'm having some issues understanding the implementation of the normalization step (commented in the _walkthrough_entropy function docstring). I will investigate this further and submit a PR when ready (if you @Datseris or someone else has any input, feel free to comment).

Interface

  • walkthrough_entropy(x, n) computes the walkthrough entropy for x at position(s) n, where 1 <= n <= length(x).

Internal changes

  • Implemented a generic EntropyGenerator struct with a corresponding entropygenerator(x, method, [, rng]) (like we do in TimeseriesSurrogates). Why? Walkthrough entropy is a function of the position n, but if not having a generator, we'd need to do initial calculations (histogram estimation) multiple times, which grows linearly with length(x).
  • For completeness, other methods should implement entropygenerator(x, method, [, rng]) too, but I haven't done so yet, before getting some feedback on this approach.
  • Added another histogram method vec_countmap(x) which returns both the unique elements of x and their frequencies. The element type of the frequencies can be customized (defaults to BigInt, because that is needed for binomial calculations with large n/N for the walkthrough entropy to avoid overflow).

(Currently) Unused files

  • walkthrough_prob.jl is currently unused, but is implemented for completeness for reproducing Stoop et al. (2021). it is conceivable that these methods become useful in some future algorithm. Note: the factorials quickly blow up. Should only be used for experimentation.

Potential future improvements

  • Use specialized methods for BigInt calculations. These are quite slow and allocates a lot at the moment.

Testing

The original paper doesn't provide any concrete examples to test on, so tests are generic. However, in the documentation example, I successfully reproduce the basic examples in Figure 1 from Stoop et al. (2021).

References

  • Stoop, R. L., Stoop, N., Kanders, K., & Stoop, R. (2021). Excess entropies suggest the physiology of neurons to be primed for higher-level computation. Physical Review Letters, 127(14), 148101.

kahaaga avatar Aug 25 '22 07:08 kahaaga

Codecov Report

Merging #70 (dd10654) into main (3c95c69) will increase coverage by 1.81%. The diff coverage is 91.83%.

@@            Coverage Diff             @@
##             main      #70      +/-   ##
==========================================
+ Coverage   79.32%   81.13%   +1.81%     
==========================================
  Files          21       25       +4     
  Lines         624      721      +97     
==========================================
+ Hits          495      585      +90     
- Misses        129      136       +7     
Impacted Files Coverage Δ
src/walkthrough/walkthrough_entropy.jl 87.75% <87.75%> (ø)
src/histogram_estimation.jl 95.91% <92.00%> (-0.09%) :arrow_down:
src/api.jl 100.00% <100.00%> (ø)
src/walkthrough/walkthrough.jl 100.00% <100.00%> (ø)
src/walkthrough/walkthrough_prob.jl 100.00% <100.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Aug 25 '22 08:08 codecov[bot]

I'll need to review this in detail before we merge, but a detail review may take a month or two...

Datseris avatar Aug 25 '22 08:08 Datseris

I'll need to review this in detail before we merge, but a detail review may take a month or two...

No pressure! I'm just tagging you for review here so you're aware of the PR.

I will probably be able to figure out the missing normalization step in the meanwhile, so that the excess entropy estimator is also included by that time.

kahaaga avatar Aug 25 '22 09:08 kahaaga

After the package redesign, it is clear that the walkthrough entropy isn't really an entropy - but is is an information measure of some sort according to our definition, since it is a function of probabilities. It is possible to create an OutcomeSpace which follows the reasoning in the paper and to define counts over the outcomes, and this outcome space is worth having. However, it isn't clear to me how these counts would relate to the probabilities that they use in the paper to define their "walkthrough entropy". I have to dig a bit deeper.

This PR will be modified accordingly.

kahaaga avatar Dec 29 '23 21:12 kahaaga