giotto-tda
giotto-tda copied to clipboard
[WIP] Extended persistence and lower star filtrations
Reference issues/PRs #337 #546
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
Description
Begins adding support for extended persistence via a new class LowerStarFlagPersistence
and a new plotting function plot_extended_diagram
. Does not yet address downstream processing of extended diagrams. There is also a new data structure for extended persistence diagrams. An extended persistence diagram is a 2D ndarray of shape (n_features, 4)
where the first 3 columns are as for ordinary persistence (birth-death-dimension), and the fourth is either 1
or -1
: 1
when the feature was born and died during the same "sweep", -1
otherwise. This allows to partition the extended diagram into the usual 4 portions:
- birth < death and same sweep
- birth < death and different sweep
- birth > death and same sweep
- birth > death and different sweep
The extended persistence diagrams are obtained by "coning".
Numerical stability issues have not yet been addressed.
Checklist
- [x] I have read the guidelines for contributing.
- [x] My code follows the code style of this project. I used
flake8
to check my Python changes. - [x] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly.
- [ ] I have added tests to cover my changes.
- [ ] All new and existing tests passed. I used
pytest
to check this on Python tests.
Following @wreise's demonstration that numerical errors can be introduced in the downard sweep if the input has 64-bit float precision (because ripser
internally converts to 32-bit precision), I have now added a conversion to float32 at the very beginning of the computation in the extended case.
Note: In discussion with @wreise, we noted that in the case of extended persistence one cannot dispose of all zero-persistence pairs. Zero-persistence pairs created in "different sweeps" are essential bars in regular persistence! An example is an isolated point.
I had a pretty thorough look, and imo, it does what it claims to. :) The new additions should be covered by the tests, apart from the plotting method.
@wreise thanks for the thorough look! Of course there are still docs to write but more importantly we can't really ship this without intervening in some way on the downstream vectorization methods, even if it's just by saying "this is not supported with extended persistence" in some cases.
Another point is that the input validation is inadequate still (my fault) because really this transformer should only work with sparse input, and yet we don't throw a useful error if the input is dense.
(...) we can't really ship this without intervening in some way on the downstream vectorization methods, even if it's just by saying "this is not supported with extended persistence" in some cases.
I agree - the pr is far from ready :)