giotto-tda [DOCS] Unclear documentation for VietorisRipsPersistence padding

[DOCS] Unclear documentation for VietorisRipsPersistence padding

Open raphaelreinauer opened this issue 2 years ago • 3 comments

There is unclear documentation for the padding used in VietorisRipsPersistence. The documentation says that diagrams may be padded with some points on the diagonal, but it does not say what the padding values are or how they are chosen. This cannot be confusing for users trying to understand the output of the persistence algorithm.

https://github.com/giotto-ai/giotto-tda/blob/8d09a39403ca11b50605bf466c1aa9f4f3876e5f/gtda/homology/_utils.py#L63 explains that the padding points are chosen as the minimum birth ever observed in that homology dimension, but this is not clear from the documentation.

It would be helpful if the documentation for VietorisRipsPersistence clarified the padding strategy used or if the code were changed to use a more standard padding strategy such as padding with zeros.

Apr 30 '22 09:04 raphaelreinauer

Hey @raphaelreinauer , thank you for pointing this out.

Let me provide clarification here before the docs are updated. For each dimension, we choose the minimum value which appears in any of the diagrams (it is set to zero if there is no finite value), see https://github.com/giotto-ai/giotto-tda/blob/8d09a39403ca11b50605bf466c1aa9f4f3876e5f/gtda/homology/_utils.py#L44-L48

This choice is indeed not standard, but to the best of my recollection, it was made with the composition of Transformers in mind. Several transformers in gtda.diagrams use the min-max values of diagrams passed as arguments to .fit to estimate the range to discretize over. By choosing values already in the image of non-trivial points, we make sure that this range is not distorted by padding.

Please let me know if that is clear and/or convincing.

May 03 '22 17:05 wreise

Excellent reply @wreise! I agree that padding was done this way for a reason, but @raphaelreinauer has a point that we could/should document it somewhere.

May 03 '22 18:05 ulupo

Thanks, @wreise for providing clarity on this. As @ulupo pointed out, it would be ideal if you could state the padding strategy and your reason for doing this in the docs. For example, people familiar with Transformer Models (like the ones in used in NLP) could mistakenly assume that the diagrams are padded with zeroes as this is the most common form of padding for the input to Transformers which could lead to bugs that are hard to detect.

May 03 '22 19:05 raphaelreinauer

giotto-tda giotto-tda copied to clipboard

[DOCS] Unclear documentation for VietorisRipsPersistence padding

giotto-tda
giotto-tda copied to clipboard