jutsu.ai icon indicating copy to clipboard operation
jutsu.ai copied to clipboard

Convert clojure data -> NDArrayWritable

Open hswick opened this issue 5 years ago • 2 comments

See here for a discussion on the desired feature.

In summary: It would be great to have a way to input plain clojure data into a model to train.

Seems like the best way to do this is to go from clojure vectors to NDArray to NDArrayWritable. jutsu.matrix already provides a way to go from clojure vectors to NDArrays, so the next step would to convert the NDArray to a NDArrayWritable.

Im posting this as a separate issue because I think this would be a great first issue for someone to tackle, and would be very helpful.

hswick avatar Jul 09 '18 03:07 hswick

I did something like this in parts The snippet below takes "features" as a vector of maps, and "y" as a vector of targets and it will create an NDArray and then a DataSet and then a DataSetIterator which could be used with the existing method "train-net!"

(def data (Nd4j/create (into-array (map double-array (map vals features))))
 (def labels (.transpose (Nd4j/create y)))
(def data-set (DataSet. data labels))

.. split  "data-set" into test and train

(def dataset-iterator (ViewIterator. train-data-set 1000) 10))

What makes it complex, is that the logic of

  • epochs
  • batches vs. mini-batches
  • in-memory vs stream data from disk
  • test-train split
  • normalization

is all encapsulated or uses the DataSetIterator interface, so it has lots of implementations, and the user needs to be able to choose and configure them.

To have this functionality fully on the Clojure site, a lot of implementations of DatasetIterator logic needed to be ported to Clojure: https://deeplearning4j.org/api/latest/org/nd4j/linalg/dataset/api/iterator/DataSetIterator.html

And this code is very much state-full OO style, so hard to make functional.

But maybe a single function , implementing "one scenario", like

(defn iterate-data [features (vector of maps)   labels (vector)   test-train-split-percentage num-batches num-epochs)

-> returns 2  data-set iterators (for train and test)

could be useful. And it would not required any change in existing methods, as they can already handle any DataSetIterator

behrica avatar Nov 26 '18 21:11 behrica

I made attempts at this before and ran into the same complexity. I actually think the best way to do is to write some java code to make a more clojure friendly DataSetIterator

hswick avatar Nov 30 '18 18:11 hswick