KDEpy icon indicating copy to clipboard operation
KDEpy copied to clipboard

Add k nearest neighbors and cross validation bandwidth methods

Open inti-abbate opened this issue 3 years ago • 6 comments

Two automatic bandwidth selection methods were added:

  • k nearest neighbors: Calculates the bw for each data point as the distance to the kth neighbor
  • cross validated bandwidth: based on a (new) grid_search_cv function, wich computes the score over a grid of bandwidths. A score method was added to BaseKDE for that purpose. Since grid_search_cv requires a model, the cv bw method could not be implemented as an independent function. Instead it is a method of BaseKDE that calls grid_search_cv with model=self (thru and intermediate function cross_val, which chooses the best bw according to the grid search).

Since both method require parameters other than data and weights, a slight change was introduced to the API: fit method now have the signature fit(self, data, weights=None, **kwargs), where **kwargs are the keyword arguments to be passed to the bw selection method. If no **kwargs is passed, the default parameters will be used, so the change is backwards compatible.

There is an issue in which I'd like to know your opinion. Both methods are quite computationally expensive (for large datasets), so they could benefit from CPU parallelization, which is quite easy to add with joblib library. On the other hand, this would go against the guideline "Import as few external dependencies as possible". Let me know which option you'd prefer.

inti-abbate avatar Jun 01 '21 14:06 inti-abbate

Hi @inti-abbate . Thanks for the PR. Letting you know now that I've seen it, and I will look more closely over it once I find the time. Regarding the last question, I think adding joblib is the way to go - it introduces a dependency, but it's common in similar libraries and it helps speed up computations.

Tests are failing due to black. Please install black and run python -m black KDEpy -l 120 to auto-format the code.

Note to self: see this comment also

tommyod avatar Jun 02 '21 15:06 tommyod

Thanks for the answer. Of course, take your time. I'll (try to) fix the failing checks, and add parallelization with joblib.

inti-abbate avatar Jun 02 '21 16:06 inti-abbate

Sorry about the previous commit, which I force-pushed it back. I just applied black, but flake8 still finds error E203. As I could find in google, this a known issue between black and flake8, and according to here, E203 should be ignored.

inti-abbate avatar Jun 02 '21 16:06 inti-abbate

Feel free to add E203 to flake8 ignore list.

tommyod avatar Jun 02 '21 16:06 tommyod