scanet3
scanet3 copied to clipboard
Type-safe, high performance, distributed Neural networks in Scala
Scanet
Type-safe, high performance, distributed Neural networks in Scala (not Python, finally...).
Intro
Low level (linear algebra) operations powered by low level TensorFlow API (C, C++ bindings via JNI).
Scala used to build computation graphs and compile them into native tensor graphs.
Compiled graphs are fully calculated in native code (on CPU, GPU or TPU)
and only result is returned back via DirectBufferwhich points into native memory.
DirectBuffer is wrapped with Tensor read-only object which allows
to slice and read data in a convenient way (just like Breeze or Numpy does).
The optimizer is built on top of Spark and can optimize the model in a distributed/parallel way.
The chosen algorithm - Data parallelism with synchronous model averaging. The dataset is split between
the workers and each epoch is run independently on each data split, at the end of each epoch
parameters are averaged and broadcasted back to each worker.
The input data is expected to be Dataset[Array[TensorType] and it contains a shape of the tensors in metadata.
Usually, TensorType is choosen to be Float since it performs best on GPU, also Double can be used.
Examples
ANN
Example of a simple MNIST dataset classifier with Fully Connected Neural Network:
val (trainingDs, testDs) = MNIST.load(sc, trainingSize = 30000)
val model = Dense(50, Sigmoid) >> Dense(10, Softmax)
val trained = trainingDs.train(model)
.loss(CategoricalCrossentropy)
.using(Adam(0.01f))
.batch(1000)
.each(1.epochs, RecordLoss(tensorboard = true))
.each(10.epochs, RecordAccuracy(testDs, tensorboard = true))
.stopAfter(200.epochs)
.run()
accuracy(trained, testDs) should be >= 0.95f
Here, loss and accuracy will be logged and added to TensorBoard as live trends. To run tensorboard execute:
pip install tensorboard
tensorboard --logdir board
CNN
Same but with CNN (Convolutional Neural Network)
val (trainingDs, testDs) = MNIST()
val model =
Conv2D(32, activation = ReLU()) >> Pool2D() >>
Conv2D(64, activation = ReLU()) >> Pool2D() >>
Flatten >> Dense(10, Softmax)
val trained = trainingDs
.train(model)
.loss(CategoricalCrossentropy)
.using(Adam(0.001f))
.batch(100)
.each(1.epochs, RecordLoss(tensorboard = true))
.each(1.epochs, RecordAccuracy(testDs, tensorboard = true))
.stopAfter(3.epochs)
.run()
accuracy(trained, testDs) should be >= 0.98f
RNN
LSTM Layer to forecast sunspots
val Array(train, test) = monthlySunspots(12).randomSplit(Array(0.8, 0.2), 1)
val model = LSTM(2) >> Dense(1, Tanh)
val trained = train
.train(model)
.loss(MeanSquaredError)
.using(Adam())
.batch(10)
.each(1.epochs, RecordLoss(tensorboard = true))
.stopAfter(100.epochs)
.run()
RMSE(trained, test) should be < 0.2f
R2Score(trained, test) should be > 0.8f
Road Map
Tensor Flow Low Level API
- [x] Tensor
- [x] DSL for computation DAG
- [x] TF Session
- [x] Core ops
- [x] Math ops
- [x] Logical ops
- [x] String ops
- [x] TF Functions, Placeholders, Session caching
- [x] Tensor Board basic support
Optimizer engine
- [x] Spark
- [ ] Hyper parameter tuning
- [ ] Model Import/Export
Optimizer algorithms
- [x] SGD
- [x] AdaGrad
- [x] AdaDelta
- [x] RMSProp
- [x] Adam
- [x] Nadam
- [x] Adamax
- [x] AMSGrad
Statistics
- [x] Variance/STD
- [ ] Covariance/Correlation Matrix
- [ ] Lots of other useful algs to analyze the data set
Models
- [x] Linear Regression
- [x] Binary Logistic Regression
- [x] ANN (Multilayer Perceptron NN)
- [x] Kernel regularization
- [x] Convolutional NN
- [x] Recurrent NN (Simple, LSTM))
- [x] Recurrent NN Enhancements
- Add GRU Cell
- Add LSTM GPU implementation LSTM: add GPU implementation, see
tensorflow/python/keras/layers/recurrent_v2.pyline 1655 - Add RNN unroll option, see tf.while_loop
- Add state between batches
- Try LSTM weights fusing (x4 less weights)
- [ ] Layers Dropout (provide random generator to layers)(Wanted!)
- [x] Batch Normalization
- [ ] others
Localization & Object Detection & Instance Segmentation
- [ ] Object Localization
- [ ] Region Proposals (Selective Search, EdgeBoxes, etc..)
- [ ] R-CNN
- [ ] Fast R-CNN
- [ ] Faster R-CNN
- [ ] YOLO (You only look once)
- [ ] SSD (Single-Shot MultiBox Detector)
Activation functions
- [x] Sigmoid
- [x] Tanh
- [x] RELU
- [x] Softmax
- [ ] Exp
- [ ] SELU
- [ ] ELU
- [ ] Sofplus
Loss functions
- [x] RMSE (Mean Squared Error)
- [x] Binary Crossentropy
- [x] Categorical Crossentropy
- [ ] Sparse Categorical Crossentropy
Benchmark Datasets
- [ ] Boston Housing price regression dataset
- [x] MNIST
- [ ] Fashion MNIST
- [ ] CIFAR-10
- [ ] CIFAR-100
- [ ] ILSVRC (ImageNet-1000)
- [ ] Pascal VOC
Preprocessing
- [ ] SVD/PCA/Whitening
- [ ] Feature scalers
- [ ] Feature embedding
- [ ] Hashed features
- [ ] Crossed features
Estimators
- [x] r2 score
- [x] accuracy estimator,
- [ ] confusion matrix, precision, recall, f1 score
- [ ] runtime estimating and new stop condition based on that
Benchmarks
- [x] LeNet
- [ ] AlexNet
- [ ] ZF Net
- [ ] ZF Net
- [ ] VGGNet
- [ ] GoogLeNet
- [ ] ResNet
- [ ] ...
CPU vs GPU vs TPU
- [ ] Create computation intensive operation, like
matmulmultiple times large tensors and compare with Scalabreeze, pythontensorflow, pythonnumpy - [ ] Compare with existing implementations using local CPU
- [ ] Compare with existing implementations using one GPU
- [ ] Compare with existing implementations using distributed mode on GCP DataProc
Other useful things
- [ ] While training analyze the weights histograms to make sure the deep NN do not saturate
- [ ] Grid/Random hyper parameters search
- [x] Different weight initializers (Xavier)
- [ ] Decay learning rate over time (step, exponential, 1/t decay)
- [ ] Try using in interactive notebook
- [ ] Add graph library so we could plot some charts and publish them in
tensorboardornotebook(maybe fork and upgradevegasto scala2.12ot tryevil-plot)
Refactoring
- Redefine the way we train model on a dataset and make a prediction.
We should cover 2 cases: BigData with
sparkwhich can train and predict on large datasets and single (batch) prediction withoutsparkdependency (to be able to expose model via API or use in realtime). For that we need to:- separate project into
core+sparkmodules. - implement model weights export/import
- implement feature preprocessing, for training use case try using
MLib, yet we need to figure out how to transform features via regular function withoutsparkinvolved - integrating with
MLibmight require redefining theDataset[Record[A]]we have right now probably better to use any abstract dataset which contains 2 required columnsfeatures+labelsfor training andfeaturesfor prediction.
- separate project into
- Add DSL to build tensor requirements like
tensor require rank(4),tensor require shape squratedMatrix - We might need to define high level untyped trait
NodewhichExpr[A]trait will extend SuchNodewill have a defined compiler, to make itExprwe would need choose an output and assign a type
If you want to become a contributor, you are welcome!!! You can pick anything from a Road Map or propose your idea.
Please, contact: