giotto-tda icon indicating copy to clipboard operation
giotto-tda copied to clipboard

[WIP] Extended persistence and lower star filtrations

Open ulupo opened this issue 4 years ago • 6 comments

Reference issues/PRs #337 #546

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)

Description Begins adding support for extended persistence via a new class LowerStarFlagPersistence and a new plotting function plot_extended_diagram. Does not yet address downstream processing of extended diagrams. There is also a new data structure for extended persistence diagrams. An extended persistence diagram is a 2D ndarray of shape (n_features, 4) where the first 3 columns are as for ordinary persistence (birth-death-dimension), and the fourth is either 1 or -1: 1 when the feature was born and died during the same "sweep", -1 otherwise. This allows to partition the extended diagram into the usual 4 portions:

  • birth < death and same sweep
  • birth < death and different sweep
  • birth > death and same sweep
  • birth > death and different sweep

The extended persistence diagrams are obtained by "coning".

Numerical stability issues have not yet been addressed.

Checklist

  • [x] I have read the guidelines for contributing.
  • [x] My code follows the code style of this project. I used flake8 to check my Python changes.
  • [x] My change requires a change to the documentation.
  • [ ] I have updated the documentation accordingly.
  • [ ] I have added tests to cover my changes.
  • [ ] All new and existing tests passed. I used pytest to check this on Python tests.

ulupo avatar Feb 03 '21 09:02 ulupo

Following @wreise's demonstration that numerical errors can be introduced in the downard sweep if the input has 64-bit float precision (because ripser internally converts to 32-bit precision), I have now added a conversion to float32 at the very beginning of the computation in the extended case.

ulupo avatar Feb 12 '21 10:02 ulupo

Note: In discussion with @wreise, we noted that in the case of extended persistence one cannot dispose of all zero-persistence pairs. Zero-persistence pairs created in "different sweeps" are essential bars in regular persistence! An example is an isolated point.

ulupo avatar Feb 12 '21 10:02 ulupo

I had a pretty thorough look, and imo, it does what it claims to. :) The new additions should be covered by the tests, apart from the plotting method.

wreise avatar Feb 16 '21 17:02 wreise

@wreise thanks for the thorough look! Of course there are still docs to write but more importantly we can't really ship this without intervening in some way on the downstream vectorization methods, even if it's just by saying "this is not supported with extended persistence" in some cases.

ulupo avatar Feb 16 '21 17:02 ulupo

Another point is that the input validation is inadequate still (my fault) because really this transformer should only work with sparse input, and yet we don't throw a useful error if the input is dense.

ulupo avatar Feb 16 '21 17:02 ulupo

(...) we can't really ship this without intervening in some way on the downstream vectorization methods, even if it's just by saying "this is not supported with extended persistence" in some cases.

I agree - the pr is far from ready :)

wreise avatar Feb 16 '21 17:02 wreise