doddle-model icon indicating copy to clipboard operation
doddle-model copied to clipboard

Implement a KNeighborsClassifier

Open binkabir opened this issue 6 years ago • 7 comments

KNeighborsClassifier seems to be a very popular classification algorithm. Do you have any plans/timeline for implementing it?

Cheers.

binkabir avatar Jan 18 '19 16:01 binkabir

Hey. No particular timeline but would be a welcome contribution. Would you perhaps be interested in tackling this? Let me know if you need any help 🙂.

inejc avatar Jan 18 '19 18:01 inejc

Hi @inejc. I will be willing to help in some algorithm implementation like this one, but Im only good in scala programming and a beginner in ML.

binkabir avatar Jan 18 '19 18:01 binkabir

@binkabir that's perfectly fine, you are in a very good position if you know scala 🙂. There are plenty of resources available for free online and I can provide additional help with the actual algorithm. A good start would be to learn how it works. Let me know if you need any help finding the learning material.

inejc avatar Jan 18 '19 18:01 inejc

Sure, I have some materials, but having you giving me more learning materials will be awesome.

binkabir avatar Jan 18 '19 19:01 binkabir

Great, I'll provide some additional links over the weekend.

inejc avatar Jan 18 '19 19:01 inejc

Hi @binkabir. For a quick overview of the algorithm I would suggest these resources:

  • https://medium.com/machine-learning-101/k-nearest-neighbors-classifier-1c1ff404d265
  • https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
  • https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

After that, it would be useful to study the following data structures:

  • https://en.wikipedia.org/wiki/K-d_tree
  • https://en.wikipedia.org/wiki/Ball_tree

Let me know if you need more math-heavy resources for the above and I'll do my best to provide them. You can also try to search for them on Google Scholar.

Note that estimators in doddle-model are implemented using typeclasses. If you are not familiar with them, you can take a look at:

  • https://alvinalexander.com/scala/fp-book/type-classes-101-introduction
  • https://tpolecat.github.io/2015/04/29/f-bounds.html

You can find the basic doddle-model typeclasses here. If you decide to implement the classification algorithm you will need to implement an instance of the Classifier typeclass. Here is an example of how this is done for the most frequent (dummy) classifier.

Don't hesitate to ask if you need any more help and thanks for being interested in making a contribution, it's really appreciated.

inejc avatar Jan 22 '19 22:01 inejc

This is awesome, will have a look at it. Thanks.

binkabir avatar Jan 23 '19 10:01 binkabir