golearn icon indicating copy to clipboard operation
golearn copied to clipboard

V.01 release

Open sjwhitworth opened this issue 11 years ago • 15 comments
trafficstars

Hi everyone. I'd like to formalise what features we want for a V.01 release. What I mean by this is, is the first version of GoLearn that is nearly ready for production use externally. We'll learn much more when it's in the hands of users. Docs need to be improved substantially, and we need a few more implementations of algorithms.

What does everyone think?

cc: @ifesdjeen @npbool @macmania @lazywei @marcoseravalli

sjwhitworth avatar May 12 '14 07:05 sjwhitworth

@e-dard @Sentimentron

sjwhitworth avatar May 12 '14 07:05 sjwhitworth

At a minimum, I'd expect

  • ID3/C4.5 Decision tees
  • Naive Bayes
  • Discretisation
  • Random Forests

Discretisation and random forests I've got working, working on ID3 now.

Sentimentron avatar May 12 '14 08:05 Sentimentron

I think basic linear models are required: logistic regression, linear regression. SVM integration would be great (w/ libsvm). Cross validation is also essential.

@Sentimentron I'm not sure in what case we will need to use discretization?

lazywei avatar May 12 '14 11:05 lazywei

ID3, as an example, only works on categorical attributes (C4.5 relaxes this restriction but it's more complex to implement). Similarly, you have to use Gaussian Naive Bayes if you want to handle continuous attributes (it's underlying assumption - that continuous attributes are normally distributed - is not always true).

Sentimentron avatar May 12 '14 11:05 Sentimentron

I think, rather than focussing on features the library needs to reach a specific bar, it's healthier to merely order the features we want in an order to tackle them.

I have an old naive Bayes implementation in Python I could port over as a first step. Could also look at implementing GNB if people think it's important after that.

One class of algorithms that are missing, which I have a few Go implementations of, are Multi-armed Bandits. A very useful reinforcement learning technique. Would be happy to port these into the library.

e-dard avatar May 12 '14 15:05 e-dard

@Sentimentron: Random forests would be great. I think that someone had already started to implement Naive Bayes..

@e-dard: Agreed. I just think it's useful to have some idea of 'minimal stable functionality' before we start promoting it more widely.

sjwhitworth avatar May 12 '14 21:05 sjwhitworth

Is someone working on naive bayes? I didn't see anything explicit in the issues list? Was working on a port of my Python implementation.

e-dard avatar May 12 '14 21:05 e-dard

This is what I've seen so far, but it seems pretty nascent. https://github.com/tncardoso/golearn/tree/feature/naive/naive

Maybe it would be good to sync alongside him.

sjwhitworth avatar May 12 '14 21:05 sjwhitworth

Any more thoughts? I think:

  • Random Forests
  • Naive Bayes
  • Stochastic/batch gradient descent
  • Cross Validation
  • Linear Regression
  • Logistic Regression

would be a great first start.

sjwhitworth avatar May 13 '14 21:05 sjwhitworth

So we now have:

Just leaving

  • Linear regression
  • Logistic regression

Is the end of June a good target?

Sentimentron avatar May 28 '14 13:05 Sentimentron

That sounds good to me. Logistic regression should be ready to merge after @npbool makes some changes. That only leaves linear regression.

sjwhitworth avatar May 28 '14 14:05 sjwhitworth

I think we've actually merged everything in that list.

Sentimentron avatar Aug 03 '14 22:08 Sentimentron

Reckon we're ready to go for a first proper release? Brilliant work @Sentimentron + all.

sjwhitworth avatar Aug 10 '14 14:08 sjwhitworth

Are we going to tag before or after #62?

Sentimentron avatar Aug 10 '14 15:08 Sentimentron

I don't mind. It all looked good to me.

sjwhitworth avatar Aug 10 '14 16:08 sjwhitworth