gudhi-devel
gudhi-devel copied to clipboard
New format for persistence
(not ready: missing tests at least, and doing the same for cubical) As a step towards using more numpy arrays, this provides an option to SimplexTree.persistence() to output a list of arrays (1 numpy array per dimension), instead of our current list of tuple (all together, with the dimension being part of the tuple). I open the PR to start the discussion.
(this PR requires to merge master for compilation to pass)
import gudhi
rips_complex = gudhi.RipsComplex(points=[[1, 1], [7, 0], [4, 6], [9, 6], [0, 14], [2, 19], [9, 17]],
max_edge_length=12.0)
simplex_tree = rips_complex.create_simplex_tree(max_dimension=1)
simplex_tree.persistence(output_type='old')
# [(0, (0.0, inf)), (0, (0.0, 8.94427190999916)), (0, (0.0, 7.280109889280518)), (0, (0.0, 6.082762530298219)),
# (0, (0.0, 5.830951894845301)), (0, (0.0, 5.385164807134504)), (0, (0.0, 5.0))]
simplex_tree.persistence(output_type='array by dimension')
# [array([[0. , 5. ],
# [0. , 5.38516481],
# [0. , 5.83095189],
# [0. , 6.08276253],
# [0. , 7.28010989],
# [0. , 8.94427191],
# [0. , inf]])]
This format is quite interesting and was initiated from discussion on #395
I don't have much to add except that I am greatly in favor of this and already using the new format using the following simple piece of translator code, if it is any help to anyone / anywhere:
def _diag_to_list_by_dim_format(pdiagram, tda_max_dim=None):
""" transforms list of Tuple(dimension, Tuple(x,y)) into list of ndarray[[x_0, y_0], [x_1, y_1], ...]
for each dimension, so if Tuple(x,y) has dimension i it will be found in the ith ndarray of
the resulting list, e.g.:
[(2, (1.0577792405537423, 1.1003878733068035)),
(0, (0.0, np.inf)),
(0, (0.0, 1.0556057201636535)),
(0, (0.0, 0.8756047102452433))]
transforms into
[array([[0. , inf],
[0. , 1.05560572],
[0. , 0.87560471]]),
array([], shape=(0, 2), dtype=float64),
array([[1.05777924, 1.10038787]])].
"""
max_dimension = max(dim for (dim, _) in pdiagram) if tda_max_dim is None else tda_max_dim
by_dim_pdiagram = [np.array([_ for (dim, _) in pdiagram if dim == i]) for i in range(0, max_dimension + 1)]
return [diag if len(diag) else np.empty(shape=[0, 2]) for diag in by_dim_pdiagram]
I also suggested future tests for diagram vectorizers in the new format there: https://github.com/GUDHI/gudhi-devel/pull/1017/files#diff-509c3dbe85dd6b515d6d3e33d0f6c074686dcb27367baa653eeedeeec534f1c8. As Marc hinted it allows for nice interactions with sklearn.compose.ColumnTransformer
for instance.