scikit-matter icon indicating copy to clipboard operation
scikit-matter copied to clipboard

Sparse kde

Open GardevoirX opened this issue 1 year ago • 4 comments

This PR introduces SparseKDE:

  • The class SparseKDE is located at src/skmatter/utils/_sparsekde.py. It mitigates the high cost of doing KDE for large datasets by doing KDE for selected data points (e.g. grid points sampled by farthest point-sampling). This class takes the original dataset as a parameter and fits the model using the sampled grid points.
  • There are two auxiliary classes and some auxiliary functions of SparseKDE stored in src/skmatter/utils/_sparsekde.py.
  • Two distance metrics compatible with PBC, pairwise_euclidean_distances and pairwise_mahalanobis_distances, are realized and stored in src/skmatter/metrics/pairwise.py.
  • Tests for SparseKDE and some auxiliary functions are stored in tests/test_neighbors.py. Tests for distance metrics are stored in tests/test_metrics.py.

I am not sure if the current API of SparseKDE is OK and if the auxiliary classes should be integrated into SparseKDE. Also, SparseKDE seems to be too large and complex. Perhaps it needs to be decomposed into smaller parts, but I have not figured out how.

Contributor (creator of PR) checklist

  • [x] Tests updated (for new features and bugfixes)?
  • [x] Documentation updated (for new features)?
  • [ ] Issue referenced (for PRs that solve an issue)?

For Reviewer

  • [x] CHANGELOG updated if important change?

📚 Documentation preview 📚: https://scikit-matter--222.org.readthedocs.build/en/222/

GardevoirX avatar Feb 16 '24 08:02 GardevoirX

Also, can you please update the CHANGELOG file.

PicoCentauri avatar May 02 '24 06:05 PicoCentauri

Hi @agoscinski, thank you for your review! I just accepted several modifications and found that the build of docs failed, and the error seems to be out of the scope of this PR. I checked this branch and it is not behind the main branch. Please take a look:

Extension error:
Here is a summary of the problems encountered when running the examples:

Unexpected failing examples:

    ../../examples/selection/GCH-ROY.py failed leaving traceback:

    Traceback (most recent call last):
      File "/home/docs/checkouts/readthedocs.org/user_builds/scikit-matter/checkouts/222/examples/selection/GCH-ROY.py", line 28, in <module>
        roy_data = load_roy_dataset()
                   ^^^^^^^^^^^^^^^^^^
      File "/home/docs/checkouts/readthedocs.org/user_builds/scikit-matter/envs/222/lib/python3.12/site-packages/skmatter/datasets/_base.py", line 145, in load_roy_dataset
        energies = np.array([f.info["energy"] for f in structures])
                             ~~~~~~^^^^^^^^^^
    KeyError: 'energy'

After I merged the main branch to the main branch of my fork, I also found that the build of docs failed.

GardevoirX avatar Jun 10 '24 17:06 GardevoirX

Hi @agoscinski, thank you for your review! I just accepted several modifications and found that the build of docs failed, and the error seems to be out of the scope of this PR. I checked this branch and it is not behind the main branch. Please take a look:

Extension error:
Here is a summary of the problems encountered when running the examples:

Unexpected failing examples:

    ../../examples/selection/GCH-ROY.py failed leaving traceback:

    Traceback (most recent call last):
      File "/home/docs/checkouts/readthedocs.org/user_builds/scikit-matter/checkouts/222/examples/selection/GCH-ROY.py", line 28, in <module>
        roy_data = load_roy_dataset()
                   ^^^^^^^^^^^^^^^^^^
      File "/home/docs/checkouts/readthedocs.org/user_builds/scikit-matter/envs/222/lib/python3.12/site-packages/skmatter/datasets/_base.py", line 145, in load_roy_dataset
        energies = np.array([f.info["energy"] for f in structures])
                             ~~~~~~^^^^^^^^^^
    KeyError: 'energy'

After I merged the main branch to the main branch of my fork, I also found that the build of docs failed.

This is an issue related to the new ase 3.23.0 version that was release a week ago. One can not use the key "energy" anymore to extract the energy but instead should use the get_potential_energy() methof of the Atoms instance. We have to fix this in a different PR.

PicoCentauri avatar Jun 10 '24 20:06 PicoCentauri

Hi @agoscinski, I think this PR is ready for further review. Thank you for your time!

GardevoirX avatar Aug 09 '24 12:08 GardevoirX