GraKeL icon indicating copy to clipboard operation
GraKeL copied to clipboard

Drop Python 2 Support, Modernize NumPy/SciPy APIs

Open yuxuan-z19 opened this issue 5 months ago • 3 comments

Summary

This PR upgrades GraKeL to support Python 3.8 and above, officially dropping Python 2.x compatibility, which has been end-of-life since January 1, 2020.

In addition:

  • [x] Refactored code to align with modern NumPy and SciPy APIs
  • [x] Updated pyproject.toml to specify minimum Python version and dependencies
  • [x] Removed or updated legacy compatibility code (e.g., six, future imports).

Compatibility

  • BREAKING CHANGE: Python <3.8 is no longer supported.
  • Unit tests now target Python 3.8+ with the latest NumPy/SciPy versions.

Rationale

  • Improves long-term maintainability and compatibility with modern ML pipelines.
  • Fixes deprecation warnings introduced by recent versions of NumPy and SciPy.
  • Unblocks integration with newer frameworks (e.g., PyTorch Geometric, ONNX tools).

Runtime Warnings in Tests

During the upgrade, the following test cases raise RuntimeWarning: invalid value encountered in sqrt or divide:

grakel/tests/test_Kernel.py::test_svm_theta
grakel/tests/test_Kernel.py::test_svm_theta_pd
  /data/zyx/GraKeL/grakel/kernels/kernel.py:196: RuntimeWarning: invalid value encountered in divide
    return km / np.sqrt(np.outer(self._X_diag, self._X_diag))

grakel/tests/test_Kernel.py::test_svm_theta
grakel/tests/test_Kernel.py::test_svm_theta_pd
  /data/zyx/GraKeL/grakel/kernels/kernel.py:161: RuntimeWarning: invalid value encountered in divide
    km /= np.sqrt(np.outer(Y_diag, X_diag))
grakel/tests/test_Kernel.py::test_neighborhood_subgraph_pairwise_distance
  /data/zyx/GraKeL/grakel/kernels/neighborhood_subgraph_pairwise_distance.py:311: RuntimeWarning: invalid value encountered in divide
    Q = K / np.sqrt(np.outer(K_diag, K_diag))

grakel/tests/test_Kernel.py::test_neighborhood_subgraph_pairwise_distance
  /data/zyx/GraKeL/grakel/kernels/neighborhood_subgraph_pairwise_distance.py:275: RuntimeWarning: invalid value encountered in divide
    S += np.nan_to_num(K / np.sqrt(np.outer(np.array(Mp.power(2).sum(-1)), N[key])))
grakel/tests/test_Kernel.py::test_random_walk
grakel/tests/test_Kernel.py::test_random_walk_pd
grakel/tests/test_Kernel.py::test_random_walk_labels
grakel/tests/test_Kernel.py::test_random_walk_labels_pd
  /data/zyx/GraKeL/grakel/kernels/kernel.py:196: RuntimeWarning: invalid value encountered in sqrt
    return km / np.sqrt(np.outer(self._X_diag, self._X_diag))

grakel/tests/test_Kernel.py::test_random_walk
grakel/tests/test_Kernel.py::test_random_walk_pd
grakel/tests/test_Kernel.py::test_random_walk_labels
grakel/tests/test_Kernel.py::test_random_walk_labels_pd
  /data/zyx/GraKeL/grakel/kernels/kernel.py:161: RuntimeWarning: invalid value encountered in sqrt
    km /= np.sqrt(np.outer(Y_diag, X_diag))

These warnings may stem from:

  • Division by zero
  • Negative values under square roots
  • Improper normalization steps

They may indicate numerical instability or missing input validation in kernels. Review is requested to:

  • Confirm whether these are expected (e.g., due to random test data)
  • Improve test assertions if needed

yuxuan-z19 avatar Jul 26 '25 02:07 yuxuan-z19

@yuxuan-z19 Thank you for your pull request! I'm very willing to proceed and pull it. I require some major and some minor changes.

Major Please make the kernels work:

  • test_svm_theta
  • test_neighborhood_subgraph_pairwise_distance
  • test_random_walk It seems that one of your changes breaks them (it's probably an overflow error?)

Random walk is a very fundamental kernel to approve me have it work while its failing. @giannisnik maybe can you have a look to help @yuxuan-z19 identify the error so we can fix it? (it's important that we pull this request as a lot of people are having hard time installing the library now.)

Minor updates @yuxuan-z19 :

  • update the version to 0.1.11
  • add uv.lock in .gitignore

ysig avatar Jul 26 '25 08:07 ysig

@ysig @giannisnik I've investigated the failing tests in grakel/tests/test_Kernel.py. All of them invoke generate_dataset() and trigger failures reproducibly on commit 6a9cebf.

The issue stems from self._X_diag containing negative values, causing np.outer() to yield invalid results during normalization (i.e., sqrt() over negative values).

As a workaround, I suggest applying a positive semi-definite (PSD) correction by shifting the kernel matrix before normalization. Here is the code snippet implementing this approach:

epsilon = 1e-12
min_diag = np.min(X_diag)
if min_diag < 0:
    shift = abs(min_diag) + epsilon
    km += shift * np.eye(km.shape[0])
    X_diag = np.diagonal(km)
out = np.outer(X_diag, X_diag)
out[out == 0] = epsilon
res = km / np.sqrt(np.outer(X_diag, X_diag))

This ensures the diagonal entries are all positive before computing the normalization denominator.

I will add a new test to verify the correctness and stability of kernel values before and after applying the PSD adjustment on the successful tests in the test_Kernel.py.

yuxuan-z19 avatar Jul 26 '25 10:07 yuxuan-z19

@yuxuan-z19 Thank you for locating this. Diagonal elements shouldn't be negative (probably an overflow) nor close to zero.

  • Overflow sounds related to: x_sol, _ = cg(A, b, rtol=1.0e-6, maxiter=20)
  • Close to zero could be related to generated_dataset. @giannisnik what do you think?

ysig avatar Jul 26 '25 11:07 ysig