scikit-matter
scikit-matter copied to clipboard
Sparse kde
This PR introduces SparseKDE:
- The class
SparseKDE
is located atsrc/skmatter/utils/_sparsekde.py
. It mitigates the high cost of doing KDE for large datasets by doing KDE for selected data points (e.g. grid points sampled by farthest point-sampling). This class takes the original dataset as a parameter and fits the model using the sampled grid points. - There are two auxiliary classes and some auxiliary functions of
SparseKDE
stored insrc/skmatter/utils/_sparsekde.py
. - Two distance metrics compatible with PBC,
pairwise_euclidean_distances
andpairwise_mahalanobis_distances
, are realized and stored insrc/skmatter/metrics/pairwise.py
. - Tests for
SparseKDE
and some auxiliary functions are stored intests/test_neighbors.py
. Tests for distance metrics are stored intests/test_metrics.py
.
I am not sure if the current API of SparseKDE
is OK and if the auxiliary classes should be integrated into SparseKDE
. Also, SparseKDE
seems to be too large and complex. Perhaps it needs to be decomposed into smaller parts, but I have not figured out how.
Contributor (creator of PR) checklist
- [x] Tests updated (for new features and bugfixes)?
- [x] Documentation updated (for new features)?
- [ ] Issue referenced (for PRs that solve an issue)?
For Reviewer
- [x] CHANGELOG updated if important change?
📚 Documentation preview 📚: https://scikit-matter--222.org.readthedocs.build/en/222/
Also, can you please update the CHANGELOG file.
Hi @agoscinski, thank you for your review! I just accepted several modifications and found that the build of docs failed, and the error seems to be out of the scope of this PR. I checked this branch and it is not behind the main branch. Please take a look:
Extension error:
Here is a summary of the problems encountered when running the examples:
Unexpected failing examples:
../../examples/selection/GCH-ROY.py failed leaving traceback:
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/scikit-matter/checkouts/222/examples/selection/GCH-ROY.py", line 28, in <module>
roy_data = load_roy_dataset()
^^^^^^^^^^^^^^^^^^
File "/home/docs/checkouts/readthedocs.org/user_builds/scikit-matter/envs/222/lib/python3.12/site-packages/skmatter/datasets/_base.py", line 145, in load_roy_dataset
energies = np.array([f.info["energy"] for f in structures])
~~~~~~^^^^^^^^^^
KeyError: 'energy'
After I merged the main branch to the main branch of my fork, I also found that the build of docs failed.
Hi @agoscinski, thank you for your review! I just accepted several modifications and found that the build of docs failed, and the error seems to be out of the scope of this PR. I checked this branch and it is not behind the main branch. Please take a look:
Extension error: Here is a summary of the problems encountered when running the examples: Unexpected failing examples: ../../examples/selection/GCH-ROY.py failed leaving traceback: Traceback (most recent call last): File "/home/docs/checkouts/readthedocs.org/user_builds/scikit-matter/checkouts/222/examples/selection/GCH-ROY.py", line 28, in <module> roy_data = load_roy_dataset() ^^^^^^^^^^^^^^^^^^ File "/home/docs/checkouts/readthedocs.org/user_builds/scikit-matter/envs/222/lib/python3.12/site-packages/skmatter/datasets/_base.py", line 145, in load_roy_dataset energies = np.array([f.info["energy"] for f in structures]) ~~~~~~^^^^^^^^^^ KeyError: 'energy'
After I merged the main branch to the main branch of my fork, I also found that the build of docs failed.
This is an issue related to the new ase 3.23.0 version that was release a week ago. One can not use the key "energy" anymore to extract the energy but instead should use the get_potential_energy()
methof of the Atoms
instance. We have to fix this in a different PR.
Hi @agoscinski, I think this PR is ready for further review. Thank you for your time!