Scanet

Type-safe, high performance, distributed Neural networks in Scala (not Python, finally...).

Intro

Low level (linear algebra) operations powered by low level TensorFlow API (C, C++ bindings via JNI).

Scala used to build computation graphs and compile them into native tensor graphs. Compiled graphs are fully calculated in native code (on CPU, GPU or TPU) and only result is returned back via DirectBufferwhich points into native memory.

DirectBuffer is wrapped with Tensor read-only object which allows to slice and read data in a convenient way (just like Breeze or Numpy does).

The optimizer is built on top of Spark and can optimize the model in a distributed/parallel way. The chosen algorithm - Data parallelism with synchronous model averaging. The dataset is split between the workers and each epoch is run independently on each data split, at the end of each epoch parameters are averaged and broadcasted back to each worker.

The input data is expected to be Dataset[Array[TensorType] and it contains a shape of the tensors in metadata. Usually, TensorType is choosen to be Float since it performs best on GPU, also Double can be used.

Examples

ANN

Example of a simple MNIST dataset classifier with Fully Connected Neural Network:

val (trainingDs, testDs) = MNIST.load(sc, trainingSize = 30000)
val model = Dense(50, Sigmoid) >> Dense(10, Softmax)
val trained = trainingDs.train(model)
  .loss(CategoricalCrossentropy)
  .using(Adam(0.01f))
  .batch(1000)
  .each(1.epochs, RecordLoss(tensorboard = true))
  .each(10.epochs, RecordAccuracy(testDs, tensorboard = true))
  .stopAfter(200.epochs)
  .run()
accuracy(trained, testDs) should be >= 0.95f

Here, loss and accuracy will be logged and added to TensorBoard as live trends. To run tensorboard execute:

pip install tensorboard
tensorboard --logdir board

CNN

Same but with CNN (Convolutional Neural Network)

val (trainingDs, testDs) = MNIST()
val model =
  Conv2D(32, activation = ReLU()) >> Pool2D() >>
    Conv2D(64, activation = ReLU()) >> Pool2D() >>
    Flatten >> Dense(10, Softmax)
val trained = trainingDs
  .train(model)
  .loss(CategoricalCrossentropy)
  .using(Adam(0.001f))
  .batch(100)
  .each(1.epochs, RecordLoss(tensorboard = true))
  .each(1.epochs, RecordAccuracy(testDs, tensorboard = true))
  .stopAfter(3.epochs)
  .run()
accuracy(trained, testDs) should be >= 0.98f

RNN

LSTM Layer to forecast sunspots

val Array(train, test) = monthlySunspots(12).randomSplit(Array(0.8, 0.2), 1)
val model = LSTM(2) >> Dense(1, Tanh)
val trained = train
  .train(model)
  .loss(MeanSquaredError)
  .using(Adam())
  .batch(10)
  .each(1.epochs, RecordLoss(tensorboard = true))
  .stopAfter(100.epochs)
  .run()
RMSE(trained, test) should be < 0.2f
R2Score(trained, test) should be > 0.8f

Road Map

Tensor Flow Low Level API

[x] Tensor
[x] DSL for computation DAG
[x] TF Session
[x] Core ops
[x] Math ops
[x] Logical ops
[x] String ops
[x] TF Functions, Placeholders, Session caching
[x] Tensor Board basic support

Optimizer engine

[x] Spark
[ ] Hyper parameter tuning
[ ] Model Import/Export

Optimizer algorithms

[x] SGD
[x] AdaGrad
[x] AdaDelta
[x] RMSProp
[x] Adam
[x] Nadam
[x] Adamax
[x] AMSGrad

Statistics

[x] Variance/STD
[ ] Covariance/Correlation Matrix
[ ] Lots of other useful algs to analyze the data set

Models

[x] Linear Regression
[x] Binary Logistic Regression
[x] ANN (Multilayer Perceptron NN)
[x] Kernel regularization
[x] Convolutional NN
[x] Recurrent NN (Simple, LSTM))
[x] Recurrent NN Enhancements
- Add GRU Cell
- Add LSTM GPU implementation LSTM: add GPU implementation, see tensorflow/python/keras/layers/recurrent_v2.py line 1655
- Add RNN unroll option, see tf.while_loop
- Add state between batches
- Try LSTM weights fusing (x4 less weights)
[ ] Layers Dropout (provide random generator to layers)(Wanted!)
[x] Batch Normalization
[ ] others

Localization & Object Detection & Instance Segmentation

[ ] Object Localization
[ ] Region Proposals (Selective Search, EdgeBoxes, etc..)
[ ] R-CNN
[ ] Fast R-CNN
[ ] Faster R-CNN
[ ] YOLO (You only look once)
[ ] SSD (Single-Shot MultiBox Detector)

Activation functions

[x] Sigmoid
[x] Tanh
[x] RELU
[x] Softmax
[ ] Exp
[ ] SELU
[ ] ELU
[ ] Sofplus

Loss functions

[x] RMSE (Mean Squared Error)
[x] Binary Crossentropy
[x] Categorical Crossentropy
[ ] Sparse Categorical Crossentropy

Benchmark Datasets

[ ] Boston Housing price regression dataset
[x] MNIST
[ ] Fashion MNIST
[ ] CIFAR-10
[ ] CIFAR-100
[ ] ILSVRC (ImageNet-1000)
[ ] Pascal VOC

Preprocessing

[ ] SVD/PCA/Whitening
[ ] Feature scalers
[ ] Feature embedding
[ ] Hashed features
[ ] Crossed features

Estimators

[x] r2 score
[x] accuracy estimator,
[ ] confusion matrix, precision, recall, f1 score
[ ] runtime estimating and new stop condition based on that

Benchmarks

[x] LeNet
[ ] AlexNet
[ ] ZF Net
[ ] ZF Net
[ ] VGGNet
[ ] GoogLeNet
[ ] ResNet
[ ] ...

CPU vs GPU vs TPU

[ ] Create computation intensive operation, like matmul multiple times large tensors and compare with Scala breeze, python tensorflow, python numpy
[ ] Compare with existing implementations using local CPU
[ ] Compare with existing implementations using one GPU
[ ] Compare with existing implementations using distributed mode on GCP DataProc

Other useful things

[ ] While training analyze the weights histograms to make sure the deep NN do not saturate
[ ] Grid/Random hyper parameters search
[x] Different weight initializers (Xavier)
[ ] Decay learning rate over time (step, exponential, 1/t decay)
[ ] Try using in interactive notebook
[ ] Add graph library so we could plot some charts and publish them in tensorboard or notebook (maybe fork and upgrade vegas to scala 2.12 ot try evil-plot)

Refactoring

Redefine the way we train model on a dataset and make a prediction. We should cover 2 cases: BigData with spark which can train and predict on large datasets and single (batch) prediction without spark dependency (to be able to expose model via API or use in realtime). For that we need to:
- separate project into core + spark modules.
- implement model weights export/import
- implement feature preprocessing, for training use case try using MLib, yet we need to figure out how to transform features via regular function without spark involved
- integrating with MLib might require redefining the Dataset[Record[A]] we have right now probably better to use any abstract dataset which contains 2 required columns features + labels for training and features for prediction.
Add DSL to build tensor requirements like tensor require rank(4), tensor require shape squratedMatrix
We might need to define high level untyped trait Node which Expr[A] trait will extend Such Node will have a defined compiler, to make it Expr we would need choose an output and assign a type

If you want to become a contributor, you are welcome!!! You can pick anything from a Road Map or propose your idea.

Please, contact:

[email protected]
[email protected]

scanet3
scanet3 copied to clipboard

Metadata

Scanet

Intro

Examples

ANN

CNN

RNN

Road Map

Tensor Flow Low Level API

Optimizer engine

Optimizer algorithms

Statistics

Models

Localization & Object Detection & Instance Segmentation

Activation functions

Loss functions

Benchmark Datasets

Preprocessing

Estimators

Benchmarks

CPU vs GPU vs TPU

Other useful things

Refactoring

← Metadata

Owner

Metadata

scanet3 scanet3 copied to clipboard

Metadata

Scanet

Intro

Examples

ANN

CNN

RNN

Road Map

Tensor Flow Low Level API

Optimizer engine

Optimizer algorithms

Statistics

Models

Localization & Object Detection & Instance Segmentation

Activation functions

Loss functions

Benchmark Datasets

Preprocessing

Estimators

Benchmarks

CPU vs GPU vs TPU

Other useful things

Refactoring

← Metadata

Owner

Metadata

scanet3
scanet3 copied to clipboard