golearn V.01 release

trafficstars

Hi everyone. I'd like to formalise what features we want for a V.01 release. What I mean by this is, is the first version of GoLearn that is nearly ready for production use externally. We'll learn much more when it's in the hands of users. Docs need to be improved substantially, and we need a few more implementations of algorithms.

What does everyone think?

cc: @ifesdjeen @npbool @macmania @lazywei @marcoseravalli

May 12 '14 07:05 sjwhitworth

@e-dard @Sentimentron

May 12 '14 07:05 sjwhitworth

At a minimum, I'd expect

ID3/C4.5 Decision tees
Naive Bayes
Discretisation
Random Forests

Discretisation and random forests I've got working, working on ID3 now.

May 12 '14 08:05 Sentimentron

I think basic linear models are required: logistic regression, linear regression. SVM integration would be great (w/ libsvm). Cross validation is also essential.

@Sentimentron I'm not sure in what case we will need to use discretization?

May 12 '14 11:05 lazywei

ID3, as an example, only works on categorical attributes (C4.5 relaxes this restriction but it's more complex to implement). Similarly, you have to use Gaussian Naive Bayes if you want to handle continuous attributes (it's underlying assumption - that continuous attributes are normally distributed - is not always true).

May 12 '14 11:05 Sentimentron

I think, rather than focussing on features the library needs to reach a specific bar, it's healthier to merely order the features we want in an order to tackle them.

I have an old naive Bayes implementation in Python I could port over as a first step. Could also look at implementing GNB if people think it's important after that.

One class of algorithms that are missing, which I have a few Go implementations of, are Multi-armed Bandits. A very useful reinforcement learning technique. Would be happy to port these into the library.

May 12 '14 15:05 e-dard

@Sentimentron: Random forests would be great. I think that someone had already started to implement Naive Bayes..

@e-dard: Agreed. I just think it's useful to have some idea of 'minimal stable functionality' before we start promoting it more widely.

May 12 '14 21:05 sjwhitworth

Is someone working on naive bayes? I didn't see anything explicit in the issues list? Was working on a port of my Python implementation.

May 12 '14 21:05 e-dard

This is what I've seen so far, but it seems pretty nascent. https://github.com/tncardoso/golearn/tree/feature/naive/naive

Maybe it would be good to sync alongside him.

May 12 '14 21:05 sjwhitworth

Any more thoughts? I think:

Random Forests
Naive Bayes
Stochastic/batch gradient descent
Cross Validation
Linear Regression
Logistic Regression

would be a great first start.

May 13 '14 21:05 sjwhitworth

So we now have:

Random Forests
Some support for SGD
Some support for cross validation
@tncardoso's Naive Bayes is looking good

Just leaving

Linear regression
Logistic regression

Is the end of June a good target?

May 28 '14 13:05 Sentimentron

That sounds good to me. Logistic regression should be ready to merge after @npbool makes some changes. That only leaves linear regression.

May 28 '14 14:05 sjwhitworth

I think we've actually merged everything in that list.

Aug 03 '14 22:08 Sentimentron

Reckon we're ready to go for a first proper release? Brilliant work @Sentimentron + all.

Aug 10 '14 14:08 sjwhitworth

Are we going to tag before or after #62?

Aug 10 '14 15:08 Sentimentron

I don't mind. It all looked good to me.

Aug 10 '14 16:08 sjwhitworth

golearn golearn copied to clipboard

V.01 release

golearn
golearn copied to clipboard