KDEpy
KDEpy copied to clipboard
Add k nearest neighbors and cross validation bandwidth methods
Two automatic bandwidth selection methods were added:
- k nearest neighbors: Calculates the bw for each data point as the distance to the kth neighbor
- cross validated bandwidth: based on a (new)
grid_search_cv
function, wich computes the score over a grid of bandwidths. A score method was added to BaseKDE for that purpose. Sincegrid_search_cv
requires a model, the cv bw method could not be implemented as an independent function. Instead it is a method ofBaseKDE
that callsgrid_search_cv
withmodel=self
(thru and intermediate functioncross_val
, which chooses the best bw according to the grid search).
Since both method require parameters other than data
and weights
, a slight change was introduced to the API:
fit
method now have the signature fit(self, data, weights=None, **kwargs)
, where **kwargs
are the keyword arguments to be passed to the bw selection method. If no **kwargs
is passed, the default parameters will be used, so the change is backwards compatible.
There is an issue in which I'd like to know your opinion. Both methods are quite computationally expensive (for large datasets), so they could benefit from CPU parallelization, which is quite easy to add with joblib
library. On the other hand, this would go against the guideline "Import as few external dependencies as possible". Let me know which option you'd prefer.
Hi @inti-abbate . Thanks for the PR. Letting you know now that I've seen it, and I will look more closely over it once I find the time. Regarding the last question, I think adding joblib is the way to go - it introduces a dependency, but it's common in similar libraries and it helps speed up computations.
Tests are failing due to black. Please install black and run python -m black KDEpy -l 120
to auto-format the code.
Note to self: see this comment also
Thanks for the answer. Of course, take your time. I'll (try to) fix the failing checks, and add parallelization with joblib.
Sorry about the previous commit, which I force-pushed it back. I just applied black, but flake8 still finds error E203. As I could find in google, this a known issue between black and flake8, and according to here, E203 should be ignored.
Feel free to add E203 to flake8 ignore list.