gudhi-devel icon indicating copy to clipboard operation
gudhi-devel copied to clipboard

convert array of filtration values to SimplexTree

Open mglisse opened this issue 4 years ago • 7 comments

The functionality seems to work, but there are at least 2 points I haven't settled on:

  • instead of an instance method that requires an empty SimplexTree, should this be a static method that creates and returns a new SimplexTree?
  • name of the function (this also depends on the previous point). insert_edges_from_matrix_of_filtrations, from_array, etc. For people not using filtration values, there is a slight risk they might misinterpret "matrix" as "adjacency matrix" (with booleans that mark the edges we want).

mglisse avatar May 29 '20 16:05 mglisse

The functionality seems to work, but there are at least 2 points I haven't settled on:

* instead of an instance method that requires an empty SimplexTree, should this be a static method that creates and returns a new SimplexTree?

* name of the function (this also depends on the previous point). insert_edges_from_matrix_of_filtrations, from_array, etc. For people not using filtration values, there is a slight risk they might misinterpret "matrix" as "adjacency matrix" (with booleans that mark the edges we want).

I think I would prefer a static method, as it happens to me several times to insert edges in a non-empty simplex tree while doing my tests. Maybe harder to document. We could document it at the end of src/python/doc/simplex_tree_ref.rst:

.. autofunction:: gudhi.XXX

I think we could name these functions create_simplex_tree_from_matrix (here we lose the edges specification), or something like that to recall the complexes create_simplex_tree methods. I do not have a strong opinion on that.

VincentRouvreau avatar Jul 31 '20 07:07 VincentRouvreau

I think I would prefer a static method,

ok.

Maybe harder to document.

? Static methods appear in the doc, with "static" in front.

I think we could name these functions create_simplex_tree_from_matrix (here we lose the edges specification), or something like that to recall the complexes create_simplex_tree methods. I do not have a strong opinion on that.

Since this is really SimplexTree.create_*, I don't really want to repeat simplex_tree in the name of the function.

mglisse avatar Jul 31 '20 09:07 mglisse

Are you ok for create_from_coo_matrix to be static also ?

VincentRouvreau avatar Aug 04 '20 12:08 VincentRouvreau

Are you ok for create_from_coo_matrix to be static also ?

It seems less obvious. While the other function requires to start from an empty simplex, this one does not, I could imagine people inserting their vertices themselves before using this function for the edges (adding the vertices to the coo_matrix is quite awkward), for instance. In some sense it also depends on how we want to optimize the function.

mglisse avatar Aug 04 '20 13:08 mglisse

Thinking about it, SciPy's coo_matrix is a bit limiting in that it can only represent matrices (2d) and not arbitrary arrays. Making it more general, like torch.sparse, would allow inserting a bunch of simplices of the same dimension. Arrays of dimension > 2 could also make sense for the dense case, @mglisse you need to think about that.

mglisse avatar Nov 28 '20 14:11 mglisse

I am looking at this again, to see what really needs to be done before it can be merged (changing it would modify the API in a way that breaks user code), and what can be done later (optimization, additional functionality).

create_from_array

  • It requires a symmetric numpy.array completely filled: that's the strictest requirement, so it can be relaxed later without breaking code using it.
  • Using the diagonal for vertices is possibly a bit questionable, but not that bad, and we can add a flag to disable this later if it looks useful.
  • The name Simplex_tree.create_from_array looks ok. It could be create_from_adjacency_matrix or many other names, but we have to pick one, and why not this one.
  • Generalizing to build triangles from a 3d array would be possible, but as the dimension increases a dense matrix is more and more redundant, and we can always add another function for that later if needed, or even change this one so it accepts arrays of dimension other than 2.

insert_edges_from_coo_matrix

  • Name looks ok.
  • The main limitation I see is the one already mentioned: we could handle a batch of k-simplices instead of just edges (k=1) if we replace coo_matrix with something similar to torch.sparse. We can add another function for that later. It will be a bit redundant, but the interface would be a bit different (names row and col vs an index).
  • The behavior for "diagonal" entries is questionable. We could document that they aren't allowed, it doesn't prevent from silently interpreting them as vertex filtrations if they do appear, and avoids constraining us for the future.
  • The current implementation is a 2-liner that a user could easily write themselves. Since users will still have the option of inserting the edges one by one, we can afford to be a bit restrictive in what we accept, but we should aim for a C++ implementation (or at least good cython) so it can be a bit faster than the Python loop. Or for now specify the interface in a way that allows to substitute such an implementation later. But it will still call insert_simplex_and_subfaces, so we can only save the cost of iterating and constructing the tuple. If we had the guarantee that we are starting from an empty Simplex_tree, we could sort the edges and build the complex by hand (similar to the code for the matrix), although it may not be worth the trouble.

mglisse avatar Mar 30 '22 21:03 mglisse

I just added insert_batch, which does something similar to insert_edges_from_coo_matrix but for any dimension, and I optimized the code a bit more (the C++ code is not that cheap, so this only improves by a factor of 3 or something of that order). I was going to remove insert_edges_from_coo_matrix, but then noticed the test combining it with cKDTree: it can be convenient to have something that reads exactly this format, without requiring that we extract row and col and combine them. On the other hand, merging 2 arrays isn't very hard, and cKDTree.sparse_distance_matrix may not be the most natural choice when both point sets are the same, cKDTree.query_pairs is more natural, even if it is sadly missing an option to return the distances at the same time.

With results="ndarray", query_pairs returns the transpose of the array insert_batch expects. I oriented the array this way to match what sparse matrices do, but if we see it as a batch insertion, it may be more natural to swap dimensions, so that it looks like a list of simplices.

mglisse avatar Apr 01 '22 21:04 mglisse