scicloj.ml
scicloj.ml copied to clipboard
A Clojure machine learning library
scicloj.ml
A idiomatic Clojure machine learning library.
Main features:
- Harmonized and idiomatic use of various classification, regression and unsupervised models
- Supports creation of machine learning pipelines as-data
- Includes easy-to-use, sophisticated cross-validations of pipelines
- Includes most important data transformation for data preprocessing
- Open to pluggable ML experiment tracking
- Open architecture to allow to plugin any potential ML model, even in non-JVM languages, including deep learning
- Based on well established Clojure/Java Data Science libraries
- tech.ml.dataset for very efficient underlying data storage
- Smile for ML models
- tech.ml as foundation of higher level ML functions
Quickstart
Dependencies:
{:deps
{scicloj/scicloj.ml {:mvn/version "0.2.0"}}}
Code:
(require '[scicloj.ml.core :as ml]
'[scicloj.ml.metamorph :as mm]
'[scicloj.ml.dataset :as ds])
;; read train and test datasets
(def titanic-train
(ds/dataset "https://github.com/scicloj/metamorph-examples/raw/main/data/titanic/train.csv" {:key-fn keyword :parser-fn :string}))
(def titanic-test
(-> "https://github.com/scicloj/metamorph-examples/raw/main/data/titanic/test.csv"
(ds/dataset {:key-fn keyword :parser-fn :string})
(ds/add-column :Survived [""] :cycle)))
;; construct pipeline function including Logistic Regression model
(def pipe-fn
(ml/pipeline
(mm/select-columns [:Survived :Pclass ])
(mm/add-column :Survived (fn [ds] (map #(case % "1" "yes" "0" "no" nil "") (:Survived ds))))
(mm/categorical->number [:Survived :Pclass])
(mm/set-inference-target :Survived)
{:metamorph/id :model}
(mm/model {:model-type :smile.classification/logistic-regression})))
;; execute pipeline with train data including model in mode :fit
(def trained-ctx
(pipe-fn {:metamorph/data titanic-train
:metamorph/mode :fit}))
;; execute pipeline in mode :transform with test data which will do a prediction
(def test-ctx
(pipe-fn
(assoc trained-ctx
:metamorph/data titanic-test
:metamorph/mode :transform)))
;; extract prediction from pipeline function result
(-> test-ctx :metamorph/data
(ds/column-values->categorical :Survived))
;; => #tech.v3.dataset.column<string>[418]
;; :Survived
;; [no, no, yes, no, no, no, no, yes, no, no, no, no, no, yes, no, yes, yes, no, no, no...]
Community
For support use Clojurians on Zulip:
or on Clojurians Slack:
Documentation
Full documentation is here as userguides
API documentation: https://scicloj.github.io/scicloj.ml
Reference to projects scicloj.ml is using/based on:
This library itself is a shim, not containing any functions.
The code is present in the following repositories, and the functions get re-exported in scicloj.ml
in a
small number of namespaces for user convenience.
- https://github.com/techascent/tech.ml
- https://github.com/scicloj/tablecloth
- https://github.com/scicloj/metamorph
- https://github.com/scicloj/metamorph.ml
- https://github.com/techascent/tech.ml.dataset
- https://github.com/scicloj/scicloj.ml.smile
- https://github.com/scicloj/scicloj.ml.xgboost
- https://github.com/haifengl/smile
Scicloj.ml organises the existing code in 3 namespaces, as following:
namespace scicloj.ml.core
Functions are re-exported from:
- scicloj.metamorph.ml.*
- scicloj.metamorph.core
namespace scicloj.ml.dataset
Functions are re-exported from:
- tabecloth.api
- tech.v3.dataset.modelling
- tech.vhttp://scicloj.ml/3.dataset.column-filters
namespace scicloj.ml.metamorph
Functions are re-exported from:
- tablecloth.pipeline
- tech.v3.libs.smile.metamorph
- scicloj.metamorph.ml
- tech.v3.dataset.metamorph
In case you are already familar with any of the original namespaces, they can of course be used directly as well:
(require '[tablecloth.api :as tc])
(tc/add-column ...)
Plugins
scicloj.ml can be easely extended by plugins, which contribute models. By now the following plugins exist:
- Builtin: scicloj.ml.smile
- Builtin: scicloj.ml.xgboost
- All sklearn models: sklearn.clj
- top2vec model: scicloj.ml.top2vec
-
crf A NER model from
standfortNLP
- clj-djl Use fasttext model from djl