golearn icon indicating copy to clipboard operation
golearn copied to clipboard

Direction of this project

Open macmania opened this issue 10 years ago • 45 comments

I was wondering if it would be better to focus on a particular class in machine learning that is computationally intensive - dimensionality reduction, neural networks, fourier transform, etc rather than re-writing the algorithms. My take is if we just use one of the multi-translator compiler - python -> go or c++ -> go, we might be able to save some time. The team can just do some tests.

macmania avatar Apr 29 '14 17:04 macmania

I'd rather write them ourselves. That way, we can better optimise to any advantages that Go will give us - e.g. using goroutines to independently train random forests. Plus, I could do with the practice :)

sjwhitworth avatar Apr 29 '14 20:04 sjwhitworth

Then I think we may need to implement some basic matrix computation algorithms first? If we want to re-build whole algorithms, it's definitely we will heavily need some fundamental infrastructures.

lazywei avatar Apr 30 '14 12:04 lazywei

Yes, I agree. There has been a good start on a matrix library that is imported in this library, but it's probable that we will need to do a lot more ourselves. Eigenvectors, etc.

sjwhitworth avatar Apr 30 '14 13:04 sjwhitworth

If we want to do so many things by ourself, we probably should think a better way to organize the packages / subpackages etc. Any thought on this?

lazywei avatar Apr 30 '14 13:04 lazywei

I'm agnostic. I'm willing to take suggestions on it, or just to start writing things, and continually refactor.

sjwhitworth avatar Apr 30 '14 15:04 sjwhitworth

Can you work on the documentation part of the project just to give an idea for the people doing the project on how to run and test a particular component. And if you have any kind of standards in terms of what coding standards - formatting, naming conventions, etc. :)

macmania avatar Apr 30 '14 23:04 macmania

Agree! We do need some formatting, naming conventions, so that others can contribute more easily. For formatting, I recommend to use http://godoc.org/code.google.com/p/go.tools/cmd/goimports For naming conventions, I have no idea yet. Another point is, how do we organize so many learning algorithms? Every algorithms in their own "golearn/xxx" package?

lazywei avatar May 01 '14 03:05 lazywei

@macmania: Sure I can. Like I said, I'm definitely willing to take suggestions from others. I don't want to be the sole arbiter of the project's direction. I will try and write some up tonight.

@lazywei: We should group algorithms together by commonalities. For example, a decision tree package can contain CART, Random Forests, etc. A neural network package could contain simple neural nets, RBM's, etc. So, not grouping by supervised/unsupervised, but by shared approaches to learning methods. I'd imagine that we'd also have a data package for cross validation/label encoding, some additional matrix/distance methods in another package, and some utility functions.

sjwhitworth avatar May 01 '14 06:05 sjwhitworth

would it be better if we had grouped the algorithms by supervised and unsupervised? We can do that later on if we have tons of packages. @sjwhitworth: sounds good, I'll work on the library after you've done the documentation ^_^

macmania avatar May 01 '14 06:05 macmania

@sjwhitworth: let me know if you need help writing the documentation for the project.

macmania avatar May 01 '14 06:05 macmania

@macmania: Sure, why not. Let's just get writing some stuff, and we can refactor when we need to :) Definitely would love help writing documentation - I'll just put it all in a Markdown file for now, and then we can pretty it up later.

sjwhitworth avatar May 01 '14 07:05 sjwhitworth

awesome - do you mind putting a google drive document url so we can edit as we go?

macmania avatar May 01 '14 07:05 macmania

Sure, here you go -> https://docs.google.com/document/d/1x21Y-g1rga0LTwC_LnKHi0y7RjFzd2Il7YB47rp7kTA/edit?usp=sharing

sjwhitworth avatar May 01 '14 07:05 sjwhitworth

can you make it editable?

macmania avatar May 01 '14 07:05 macmania

Hi! I find this an interesting project and I would like to collaborate. I have only a couple of questions:

  1. Would it make sense to have the tests in a separate folder? So that they don't "spoil" the code within the library
  2. Is there a precise reason why the import statements refere the $GOPATH and they are not referring the same package? The thing is that if I make some changes I don't see them in the examples. I hope the explanation is understandable.

I can implement the regressions, cause I have already done it in C.

mseravalli avatar May 01 '14 12:05 mseravalli

@marcoseravalli I'm not sure whether it is a good practice. However, I found it in the Choose a good import path Secion. Maybe we should follow that. You can pull this repository, make changes, and commit to your fork. The point is that you may need to pull this repo to github.com/sjwhitworth/golearn instead of github.com/marcoseravalli/golearn locally.

lazywei avatar May 01 '14 13:05 lazywei

Surely it's easier if we just keep everything in the same repo, and then just send pull requests?

Yes, we can have tests in a separate folder. Let's not worry too much about structure at this point. We should start writing stuff, and then refactor as we go along - otherwise, it's premature optimisation.

sjwhitworth avatar May 01 '14 13:05 sjwhitworth

I read a bit more about the structure of the packages and it makes sense to keep it the current way.

Ok sure, let's start writing some stuff!

mseravalli avatar May 01 '14 14:05 mseravalli

What would you like to start working on?

sjwhitworth avatar May 01 '14 14:05 sjwhitworth

I can start working on the regressions. Is it ok?

mseravalli avatar May 01 '14 15:05 mseravalli

Logistic/linear or both?

sjwhitworth avatar May 01 '14 15:05 sjwhitworth

I would start with linear first. Then I can move to the logistic.

mseravalli avatar May 01 '14 15:05 mseravalli

I opened an issue and assigned it to you, @marcoseravalli .

sjwhitworth avatar May 01 '14 15:05 sjwhitworth

ok cool! i'll start working on it!

mseravalli avatar May 01 '14 16:05 mseravalli

Hey @sjwhitworth - can you open up an issue for me - neural networks :) Thanks!

macmania avatar May 02 '14 13:05 macmania

Do you guys have experience also with other algebra libraries for go? I found this one for example that has also c bindings with BLAS http://godoc.org/code.google.com/p/biogo.matrix

mseravalli avatar May 02 '14 14:05 mseravalli

I also found a interesting organization on Github: https://github.com/gonum They implement some numeric libraries.

I think we should consider whether to use go.matrix in the future or not. ML heavily depends on linear algebra, matrix computations etc., and go.matrix seems not be maintained anymore. In additionally, I found the matrix product in go.matrix is a little not consistency when I implemented the metric functions. So we may need to find a better or more active package.

lazywei avatar May 02 '14 14:05 lazywei

It's up to you guys. We should probably fork whatever library you prefer have done, and then build on top of that, and merge back into their master if it proves beneficial for them. I'll leave the decision up to you two @lazywei + @marcoseravalli.

sjwhitworth avatar May 02 '14 15:05 sjwhitworth

It seems both of gonum/matrix/mat64 and biogo.matrix are created by same author, and it seems that the former are more active. So I prefer to use gonum/matrix/mat64. What do you say, @marcoseravalli ?

lazywei avatar May 02 '14 16:05 lazywei

Let's move to mat64.

sjwhitworth avatar May 02 '14 20:05 sjwhitworth

I also think that mat64 is a good choice

mseravalli avatar May 03 '14 13:05 mseravalli

The documentation is pretty rubbish for mat64 @lazywei. We should fork, and write our own docs ourselves.

sjwhitworth avatar May 03 '14 18:05 sjwhitworth

Totally agree with you! I will fork and add some docs to functions that I know how they work, and send PR back to upstream.

lazywei avatar May 04 '14 02:05 lazywei

or could it also be an option to fork biogo.matrix? the solution seems to be better documented: http://godoc.org/code.google.com/p/biogo.matrix And since the project is still under development we both can benefit from the changes. What do you think?

mseravalli avatar May 05 '14 19:05 mseravalli

I think that sounds sensible. They seem to implement the same things. And if we're going to make a decision, it should be now, before we have to port lots of code. What say you, @lazywei ?

sjwhitworth avatar May 05 '14 19:05 sjwhitworth

biogo.matrix seems also to provide some more features wrt error handling, but less arithmetic...

mseravalli avatar May 05 '14 19:05 mseravalli

I love the presentation and docs on this project, I think this is going to the right direction. Here I would like to suggest going forward, we may implement a simple "WEB/API" package which can be used as a drop-in replacement to some external service, or simply a showcase for real life usage.

This is a package that may contain a few simple API endpoints and being very simple to understand and use (inspired by Seldon project, for example:

  • /event/ endpoint to consume data input (general or with model schema definition).
  • /predict/ endpoint to basically output prediction of what have been trained/consume so far

This can attract wider audience with programming/web background who are seeking simple prediction/recommendation solution without strong ML background (myself being one of them).

Currently although this project is great in golang, if I were to deploy something related to ML in web technology like Python/Java I'm going to build with statsmodels or opt for external service because they are easy to use and understand, but given the awesomeness of golang concurrency and much better latency for API development, this project can also provide a great alternative for those people how to make a simple use case.

What do you think @sjwhitworth and everyone? Although I'm only beginner of Golang, I would like to contribute to this great project to as much extent as I can.

anzellai avatar Sep 26 '15 18:09 anzellai

@anzellai I think it's a good idea so people can get an understanding of how to use this project. I do think such a thing should be a separate project repo though.

nickpoorman avatar Sep 26 '15 18:09 nickpoorman

@nickpoorman I don't disagree, my suggestion is simply a way forward how to attract wider audience, and follow this project spirit, being simple to use and understand.

Let see how people think about this idea and we may consider how to implement this.

anzellai avatar Sep 26 '15 18:09 anzellai

I've spent some time thinking about this and I think it's a good idea. We need to improve our APIs to support various streaming/low-volume retraining and prediction events. At the moment, things in base assume that things will generally a fixed size, I think it's time to change that assumption.

Sentimentron avatar Oct 07 '15 10:10 Sentimentron

@Sentimentron +1

nickpoorman avatar Oct 07 '15 12:10 nickpoorman

+1 perhaps we can create a resources package simply wrapped with an interface of a user defined model to provide instant API endpoints something like:

/api/{resources}/data/ [POST] /api/{resources}/event/ [POST] /api/{resources}/predict/** [GET] /api/{resources}/report/{method}/ [GET]

We can also keep it really simple and leave all authentication stuff for user to implement.

anzellai avatar Oct 07 '15 14:10 anzellai

I specifically wrote my own knn, and k means clustering algorthims so I could work with them in a service because golearn had a more data analysis and very static approach to modeling and training. Just feedback about this. At the time (a year ago) it made sense. Things may have changed. It might make sense to think about how the library could be used in a service and have an example of using things in a service. I think a full ML api that adapts and retrains is a little beyond a core libraries scope.

My 2c On Oct 7, 2015 9:54 AM, "Anzel Lai" [email protected] wrote:

+1 perhaps we can create a resources package simply wrapped with an interface of a user defined model to provide instant API endpoints something like:

/api/{resources}/data/ [POST] /api/{resources}/event/ [POST] /api/{resources}/predict/** [GET] /api/{resources}/report/{method}/ [GET]

We can also keep it really simple and leave all authentication stuff for user to implement.

— Reply to this email directly or view it on GitHub https://github.com/sjwhitworth/golearn/issues/7#issuecomment-146219180.

savorywatt avatar Oct 07 '15 23:10 savorywatt

@savorywatt I agree with your points and probably it's really time to start a separate repo to implement a full ML api. I also think some database supports would be a good idea.

For everyone else, may I ask who would like to drive this forward? I would like to help (or be part of it) and make this happen.

anzellai avatar Feb 05 '16 15:02 anzellai

This issue is open for more that 6 years now. And it's not about something, that can be fixed in code. Can it be closed?

chrmang avatar Nov 20 '20 19:11 chrmang