giotto-tda
giotto-tda copied to clipboard
[DOCS] Unclear documentation for VietorisRipsPersistence padding
There is unclear documentation for the padding used in VietorisRipsPersistence. The documentation says that diagrams may be padded with some points on the diagonal, but it does not say what the padding values are or how they are chosen. This cannot be confusing for users trying to understand the output of the persistence algorithm.
https://github.com/giotto-ai/giotto-tda/blob/8d09a39403ca11b50605bf466c1aa9f4f3876e5f/gtda/homology/_utils.py#L63 explains that the padding points are chosen as the minimum birth ever observed in that homology dimension, but this is not clear from the documentation.
It would be helpful if the documentation for VietorisRipsPersistence clarified the padding strategy used or if the code were changed to use a more standard padding strategy such as padding with zeros.
Hey @raphaelreinauer , thank you for pointing this out.
Let me provide clarification here before the docs are updated. For each dimension, we choose the minimum value which appears in any of the diagrams (it is set to zero if there is no finite value), see https://github.com/giotto-ai/giotto-tda/blob/8d09a39403ca11b50605bf466c1aa9f4f3876e5f/gtda/homology/_utils.py#L44-L48
This choice is indeed not standard, but to the best of my recollection, it was made with the composition of Transformers in mind. Several transformers in gtda.diagrams
use the min-max values of diagrams passed as arguments to .fit
to estimate the range to discretize over. By choosing values already in the image of non-trivial points, we make sure that this range is not distorted by padding.
Please let me know if that is clear and/or convincing.
Excellent reply @wreise! I agree that padding was done this way for a reason, but @raphaelreinauer has a point that we could/should document it somewhere.
Thanks, @wreise for providing clarity on this. As @ulupo pointed out, it would be ideal if you could state the padding strategy and your reason for doing this in the docs. For example, people familiar with Transformer Models (like the ones in used in NLP) could mistakenly assume that the diagrams are padded with zeroes as this is the most common form of padding for the input to Transformers which could lead to bugs that are hard to detect.