Drop Python 2 Support, Modernize NumPy/SciPy APIs
Summary
This PR upgrades GraKeL to support Python 3.8 and above, officially dropping Python 2.x compatibility, which has been end-of-life since January 1, 2020.
In addition:
- [x] Refactored code to align with modern NumPy and SciPy APIs
- [x] Updated
pyproject.tomlto specify minimum Python version and dependencies - [x] Removed or updated legacy compatibility code (e.g., six, future imports).
Compatibility
- BREAKING CHANGE: Python <3.8 is no longer supported.
- Unit tests now target Python 3.8+ with the latest NumPy/SciPy versions.
Rationale
- Improves long-term maintainability and compatibility with modern ML pipelines.
- Fixes deprecation warnings introduced by recent versions of NumPy and SciPy.
- Unblocks integration with newer frameworks (e.g., PyTorch Geometric, ONNX tools).
Runtime Warnings in Tests
During the upgrade, the following test cases raise RuntimeWarning: invalid value encountered in sqrt or divide:
grakel/tests/test_Kernel.py::test_svm_theta
grakel/tests/test_Kernel.py::test_svm_theta_pd
/data/zyx/GraKeL/grakel/kernels/kernel.py:196: RuntimeWarning: invalid value encountered in divide
return km / np.sqrt(np.outer(self._X_diag, self._X_diag))
grakel/tests/test_Kernel.py::test_svm_theta
grakel/tests/test_Kernel.py::test_svm_theta_pd
/data/zyx/GraKeL/grakel/kernels/kernel.py:161: RuntimeWarning: invalid value encountered in divide
km /= np.sqrt(np.outer(Y_diag, X_diag))
grakel/tests/test_Kernel.py::test_neighborhood_subgraph_pairwise_distance
/data/zyx/GraKeL/grakel/kernels/neighborhood_subgraph_pairwise_distance.py:311: RuntimeWarning: invalid value encountered in divide
Q = K / np.sqrt(np.outer(K_diag, K_diag))
grakel/tests/test_Kernel.py::test_neighborhood_subgraph_pairwise_distance
/data/zyx/GraKeL/grakel/kernels/neighborhood_subgraph_pairwise_distance.py:275: RuntimeWarning: invalid value encountered in divide
S += np.nan_to_num(K / np.sqrt(np.outer(np.array(Mp.power(2).sum(-1)), N[key])))
grakel/tests/test_Kernel.py::test_random_walk
grakel/tests/test_Kernel.py::test_random_walk_pd
grakel/tests/test_Kernel.py::test_random_walk_labels
grakel/tests/test_Kernel.py::test_random_walk_labels_pd
/data/zyx/GraKeL/grakel/kernels/kernel.py:196: RuntimeWarning: invalid value encountered in sqrt
return km / np.sqrt(np.outer(self._X_diag, self._X_diag))
grakel/tests/test_Kernel.py::test_random_walk
grakel/tests/test_Kernel.py::test_random_walk_pd
grakel/tests/test_Kernel.py::test_random_walk_labels
grakel/tests/test_Kernel.py::test_random_walk_labels_pd
/data/zyx/GraKeL/grakel/kernels/kernel.py:161: RuntimeWarning: invalid value encountered in sqrt
km /= np.sqrt(np.outer(Y_diag, X_diag))
These warnings may stem from:
- Division by zero
- Negative values under square roots
- Improper normalization steps
They may indicate numerical instability or missing input validation in kernels. Review is requested to:
- Confirm whether these are expected (e.g., due to random test data)
- Improve test assertions if needed
@yuxuan-z19 Thank you for your pull request! I'm very willing to proceed and pull it. I require some major and some minor changes.
Major Please make the kernels work:
- test_svm_theta
- test_neighborhood_subgraph_pairwise_distance
- test_random_walk It seems that one of your changes breaks them (it's probably an overflow error?)
Random walk is a very fundamental kernel to approve me have it work while its failing. @giannisnik maybe can you have a look to help @yuxuan-z19 identify the error so we can fix it? (it's important that we pull this request as a lot of people are having hard time installing the library now.)
Minor updates @yuxuan-z19 :
- update the version to 0.1.11
- add uv.lock in .gitignore
@ysig @giannisnik I've investigated the failing tests in grakel/tests/test_Kernel.py. All of them invoke generate_dataset() and trigger failures reproducibly on commit 6a9cebf.
The issue stems from self._X_diag containing negative values, causing np.outer() to yield invalid results during normalization (i.e., sqrt() over negative values).
As a workaround, I suggest applying a positive semi-definite (PSD) correction by shifting the kernel matrix before normalization. Here is the code snippet implementing this approach:
epsilon = 1e-12
min_diag = np.min(X_diag)
if min_diag < 0:
shift = abs(min_diag) + epsilon
km += shift * np.eye(km.shape[0])
X_diag = np.diagonal(km)
out = np.outer(X_diag, X_diag)
out[out == 0] = epsilon
res = km / np.sqrt(np.outer(X_diag, X_diag))
This ensures the diagonal entries are all positive before computing the normalization denominator.
I will add a new test to verify the correctness and stability of kernel values before and after applying the PSD adjustment on the successful tests in the test_Kernel.py.
@yuxuan-z19 Thank you for locating this. Diagonal elements shouldn't be negative (probably an overflow) nor close to zero.
- Overflow sounds related to:
x_sol, _ = cg(A, b, rtol=1.0e-6, maxiter=20) - Close to zero could be related to
generated_dataset. @giannisnik what do you think?