scikit.js icon indicating copy to clipboard operation
scikit.js copied to clipboard

Overarching Plan (to MVP / version 1)

Open dcrescim opened this issue 4 years ago • 4 comments

Hey Folks! I thought it might be a bit easier if we had one issue that had the current "state of the world". It would have a list of all completed Estimators/Functions and next to each it would have a person's name if someone was working on it or it'd be checked if it was complete and merged in dev.

Ping me in the comments beneath and I'll add you to whichever estimators you want to work on.

I went through the scikit-learn docs yesterday and broke out the Estimators that we would need for an MVP of scikit.js (let's call it version 1).

Version 1

The focus here is on simple models, and all the preprocessing, and metrics that you'd need to perform high quality model generation.

linear_model

  • [x] LinearRegression
  • [x] LassoRegression
  • [x] RidgeRegression
  • [x] ElasticNet
  • [x] LogisticRegression
  • [x] SGDClassifier
  • [x] SGDRegressor

cluster

  • [x] KMeans

neighbors

  • [x] #37
  • [x] #38

dummy

  • [x] DummyClassifier
  • [x] DummyRegressor

impute

  • [x] SimpleImputer

preprocessing

  • [x] StandardScaler
  • [x] MinMaxScaler
  • [x] MaxAbsScaler
  • [x] Normalizer
  • [x] RobustScaler
  • [x] LabelEncoder
  • [x] OneHotEncoder
  • [x] OrdinalEncoder

pipeline

  • [x] Pipeline

compose

  • [x] ColumnTransformer

tree

  • [x] #39
  • [x] #40

metrics

  • [x] accuracyScore
  • [x] confusionMatrix
  • [x] hingeLoss
  • [x] logLoss
  • [x] precisionScore
  • [x] recallScore
  • [x] rocAucScore
  • [x] zeroOneLoss
  • [x] meanAbsoluteError
  • [x] meanSquaredError
  • [x] meanSquaredLogError
  • [x] r2Score

So pick whichever ya want, and ping me, and I'll update the issue and put your name next to the Estimator / Function.

Some great resources for contributors

  • ML from scratch in Python : https://github.com/eriklindernoren/ML-From-Scratch
  • Nick Leclure's Book: https://github.com/nfmcclure/tensorflow_cookbook
  • Charlie Gerard's Book: Practical Machine Learning with Tensorflow.js
  • MachineLearnjs : https://github.com/machinelearnjs/machinelearnjs

Hello folks! Time flies when you're having fun :) We are rounding the corner the completion of the MVP / Version 1 list above. I thought it would be good to go through scikit-learn and make a list of the next most important things. That list is below as well as some general todos (docs, tutorials). Feel free to ping me or comment below and grab whatever interests in the following list.

Onward and Upward!

linear_model

  • [ ] Exact solution for linear_regression

datasets

  • [x] Iris
  • [x] Boston Housing
  • [x] #44
  • [x] #50

naive_bayes

  • [x] #41
  • [x] #42

svm

  • [x] LinearSVC
  • [x] LinearSVR
  • [x] SVC
  • [x] SVR

model_selection

  • [ ] GroupKFold
  • [x] #46
  • [ ] ShuffleSplit
  • [x] #45

decomposition

  • [ ] PCA

hyper_parameter

  • [x] #187

ensemble

  • [x] VotingRegressor
  • [x] VotingClassifier
  • [ ] RandomForestClassifier
  • [ ] RandomForestRegressor

docs

  • [x] Make Basic Docs site
  • [x] Push the Basic Docs site to scikit.org. Have scikit.js redirect to scikit.org
  • [x] Make Basic Docs site show api for all functions / classes that we export
  • [x] Make it build browser and node versions
  • [x] Make the tests run against browser and node environments

dcrescim avatar Oct 11 '21 17:10 dcrescim

Thanks for creating this @dcrescim I'll add some more features we may need as well.

risenW avatar Oct 13 '21 08:10 risenW

One thing we should also be working towards is showing off the strength of machine learning in the browser: interactivity. We should build some kind of playground, similar to the Tensorflow Playground.

DirkToewe avatar Jan 16 '22 19:01 DirkToewe

One thing we should also be working towards is showing off the strength of machine learning in the browser: interactivity. We should build some kind of playground, similar to the Tensorflow Playground.

+1 on this from me. Another suggestion, we have a bunch of drag and drop/select features where users can upload sample data, select an ML algorithm we support, and then run training and predictions on it.

cc @dcrescim @yawetse @Lewuathe @steveoni

risenW avatar Jan 18 '22 06:01 risenW

I totally agree with this. I wonder if there is a way that we can support this on our docs site. Just have a page at scikitjs.org/playground and it is all setup to mess around with data. That just makes it easier to for us to build that playground as part of this git repo and so it is free/easy to deploy.

Couldn't agree more with the ideas above @DirkToewe @risenW

dcrescim avatar Jan 18 '22 22:01 dcrescim