gudhi-devel icon indicating copy to clipboard operation
gudhi-devel copied to clipboard

New format for persistence

Open mglisse opened this issue 1 year ago • 2 comments

(not ready: missing tests at least, and doing the same for cubical) As a step towards using more numpy arrays, this provides an option to SimplexTree.persistence() to output a list of arrays (1 numpy array per dimension), instead of our current list of tuple (all together, with the dimension being part of the tuple). I open the PR to start the discussion.

mglisse avatar Jul 28 '23 11:07 mglisse

(this PR requires to merge master for compilation to pass)

import gudhi
rips_complex = gudhi.RipsComplex(points=[[1, 1], [7, 0], [4, 6], [9, 6], [0, 14], [2, 19], [9, 17]],
                                 max_edge_length=12.0)

simplex_tree = rips_complex.create_simplex_tree(max_dimension=1)
simplex_tree.persistence(output_type='old')
# [(0, (0.0, inf)), (0, (0.0, 8.94427190999916)), (0, (0.0, 7.280109889280518)), (0, (0.0, 6.082762530298219)),
#  (0, (0.0, 5.830951894845301)), (0, (0.0, 5.385164807134504)), (0, (0.0, 5.0))]
simplex_tree.persistence(output_type='array by dimension')
# [array([[0.        , 5.        ],
#        [0.        , 5.38516481],
#        [0.        , 5.83095189],
#        [0.        , 6.08276253],
#        [0.        , 7.28010989],
#        [0.        , 8.94427191],
#        [0.        ,        inf]])]

This format is quite interesting and was initiated from discussion on #395

VincentRouvreau avatar Nov 29 '23 08:11 VincentRouvreau

I don't have much to add except that I am greatly in favor of this and already using the new format using the following simple piece of translator code, if it is any help to anyone / anywhere:

def _diag_to_list_by_dim_format(pdiagram, tda_max_dim=None):
    """ transforms list of Tuple(dimension, Tuple(x,y)) into list of ndarray[[x_0, y_0], [x_1, y_1], ...]
     for each dimension, so if Tuple(x,y) has dimension i it will be found in the ith ndarray of
     the resulting list, e.g.:
         [(2, (1.0577792405537423, 1.1003878733068035)),
          (0, (0.0, np.inf)),
          (0, (0.0, 1.0556057201636535)),
          (0, (0.0, 0.8756047102452433))]
     transforms into
         [array([[0.        ,        inf],
                 [0.        , 1.05560572],
                 [0.        , 0.87560471]]),
         array([], shape=(0, 2), dtype=float64),
         array([[1.05777924, 1.10038787]])].
    """
    max_dimension = max(dim for (dim, _) in pdiagram) if tda_max_dim is None else tda_max_dim
    by_dim_pdiagram = [np.array([_ for (dim, _) in pdiagram if dim == i]) for i in range(0, max_dimension + 1)]
    return [diag if len(diag) else np.empty(shape=[0, 2]) for diag in by_dim_pdiagram]

I also suggested future tests for diagram vectorizers in the new format there: https://github.com/GUDHI/gudhi-devel/pull/1017/files#diff-509c3dbe85dd6b515d6d3e33d0f6c074686dcb27367baa653eeedeeec534f1c8. As Marc hinted it allows for nice interactions with sklearn.compose.ColumnTransformer for instance.

martinroyer avatar Jun 25 '24 10:06 martinroyer