TopoNetX icon indicating copy to clipboard operation
TopoNetX copied to clipboard

First Implementation of a Simplex Trie

Open ffl096 opened this issue 2 years ago • 10 comments

This implements a simplex trie as presented in [1] as backend data structure for the SimplicialComplex class. This is also used in gudhi's SC implementation. However, they do not expose all functionality we need and the data structure is implemented in native code, so we cannot interact with it directly either.

Using a simplex tree should bring some nice performance improvements over the previous approach and fixes some bugs along the way as well. I will add some comparisons later.

[1] Jean-Daniel Boissonnat and Clément Maria. The Simplex Tree: An Efficient Data Structure for General Simplicial Complexes. Algorithmica, pages 1–22, 2014

ffl096 avatar Aug 18 '23 14:08 ffl096

@mhajij The tests fail because coseg loads a pickled state of SimplicialComplex with internal properties. This is (unrelated to this pull request) a bad idea, as any change of the data structure may lead to errors, or worse undetected inconsistencies.

ffl096 avatar Aug 18 '23 14:08 ffl096

@mhajij The tests fail because coseg loads a pickled state of SimplicialComplex with internal properties. This is (unrelated to this pull request) a bad idea, as any change of the data structure may lead to errors, or worse undetected inconsistencies.

@ffl096 I am not sure we should merge this pull request now because the ICML challenge participants might have used that dataset and I think we need to merge the pull request they have their first before we merge this particular pull request. What do you think?

mhajij avatar Aug 18 '23 15:08 mhajij

This is a draft pull request, it is not to be merged right now regardless :)

However, just to clarify: I do not propose to remove the coseg dataset. We have to think about a reasonable data format to deliver the dataset that does not rely on pickle. Ideally, the return value of the coseg function should stay exactly the same. SimplicialComplex objects in this pr are compatible to the previous implementation as long as the user does not access internal state. The ICML submissions should all be fine.

ffl096 avatar Aug 19 '23 07:08 ffl096

This is a draft pull request, it is not to be merged right now regardless :)

However, just to clarify: I do not propose to remove the coseg dataset. We have to think about a reasonable data format to deliver the dataset that does not rely on pickle. Ideally, the return value of the coseg function should stay exactly the same. SimplicialComplex objects in this pr are compatible to the previous implementation as long as the user does not access internal state. The ICML submissions should all be fine.

we need to create a Data object to be utilized in the higher order context. I think the one available in torch is good enough.

This is an example on how it can be used in a higher order DL model https://github.com/pyt-team/TopoModelX/blob/569bd193f81d47e04891376676c034e90cc07554/tutorials/combinatorial/hmc_train.ipynb

mhajij avatar Aug 19 '23 15:08 mhajij

@ffl096 I think we can merge this now, testing is failing however, can you please take care of it so we can merge ? also lint.

mhajij avatar Sep 14 '23 13:09 mhajij

The dataset issue still stands and is outside of the scope to be fixed here. We cannot reliably use pickled objects as data objects.

ffl096 avatar Sep 14 '23 13:09 ffl096

The dataset issue still stands and is outside of the scope to be fixed here. We cannot reliably use pickled objects as data objects.

I cannot merge wihout passing the tests, what do you think we should do? should we fix the dataset issues first?

mhajij avatar Sep 14 '23 14:09 mhajij

According to git blase, the coseg dataset downloaded from here was preprocessed by you, right? This repo does not contain this preprocessing script, can you provide that to me? Same for shrec_16.

ffl096 avatar Sep 14 '23 14:09 ffl096

@ffl096 What do you want to do with this PR ? I think we need to have SC faster and implemented correctly but many code relies on the datasets-- what do you suggest?

USFCA-MSDS avatar Feb 09 '24 09:02 USFCA-MSDS

As outlined above, the dataset structure has to be overhauled completely. This is outside of the scope of this pull request though, and needs to be done regardless. The current system is highly unstable. Once that is done, this pull request is good to be merged.

ffl096 avatar Feb 09 '24 09:02 ffl096

Codecov Report

Attention: Patch coverage is 99.63100% with 1 line in your changes missing coverage. Please review.

Project coverage is 97.89%. Comparing base (5b2284b) to head (d6dad04). Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
toponetx/classes/simplicial_complex.py 98.68% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #220      +/-   ##
==========================================
+ Coverage   97.83%   97.89%   +0.06%     
==========================================
  Files          38       40       +2     
  Lines        3558     3663     +105     
==========================================
+ Hits         3481     3586     +105     
  Misses         77       77              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Dec 17 '24 09:12 codecov[bot]