gudhi-devel
gudhi-devel copied to clipboard
Implementing sklearn set_output API for BaseEstimator s
Simple example showing how to implement scikit-learn's set_output API and an example test.
This outputs diagrams nicely:
>>> RipsPersistence(homology_dimensions=[0, 2], n_jobs=-2).set_output(transform="pandas").fit_transform(point_clouds)
H0 H2
0 [[0.0, 10.456032537513941], [0.0, inf]] []
1 [[0.0, 12.925628779692705], [0.0, inf]] []
2 [[0.0, 11.126843373147965], [0.0, inf]] []
3 [[0.0, 12.647790894762348], [0.0, inf]] []
4 [[0.0, 11.340625000427952], [0.0, inf]] []
Let me add that although this PR is very simple, I am a bit sore that it took so long for me to write for the following reasons:
- I am on windows (granted this is on me)
- the test_sklearn_rips_persistence.py generates points using gudhi.datasets.generators.points
- this gudhi.datasets.generators.points imports from gudhi.datasets.generators._points.cc which if I understood correctly requires CGAL in the gudhi computation step
- CGAL is currently quite hard to install on some windows : for some reason if installed with
conda install cgal-cpp
, conda will not also install GMP (conda claims specs are incompatible) so gudhi will not be happy on compilation because it cannot find GMP, if installed throughvcpkg
the install will be stuck onBuilding x64-windows-dbg
(see https://github.com/microsoft/vcpkg/issues/31181 for detailed symptoms resembling mine), and installing GMP any other way seems nearly impossible on windows. - but all this turned to be actually unnecessary, as the only reason for gudhi.datasets.generators.points to import the _points.cc is to ... generate points on a sphere or a torus, which I reckon can easily be done in python too!
So we could probably get rid of this dependency if I understood that correctly...