core.matrix
core.matrix copied to clipboard
reshape performance very bad with large double arrays
(defn reshape-time-test
[]
(let [n-rows 100
n-cols 1000
src-array (double-array (* n-rows n-cols))]
(println "reshape time")
(time (dotimes [idx 100]
(m/reshape src-array [n-rows n-cols])))
(println "c-for time")
(time (dotimes [idx 100]
(let [^"[[D" dest (make-array Double/TYPE n-rows n-cols)]
(c-for [row 0 (< row n-rows) (inc row)]
(java.lang.System/arraycopy src-array (* row n-cols) (get dest row) 0 n-cols)))))))
(reshape-time-test)
reshape time
"Elapsed time: 174760.275438 msecs"
c-for time
"Elapsed time: 19.301593 msecs"
nil
For sanity's sake you may want to try with counts of 10 instead of 100.
I researched this a bit and I found the source likely two things:
First, aset-double is doing reflection ... so that is in core.clj of clojure itself. Second, (mp/get-2d data i j) I believe is doing nth on an array which apparently is quite slow.
I am running into this importing vgg16 into cortex from keras.
So the problem here is fundamentally that we don't yet have a reshape operation for the :double-array implementation. Hence it is falling back to a default implementation, which certainly isn't optimised for the double-array case.
I'll take a look and see if I can optimise this at all.
In the meantime, the obvious solution is just to use an implementation that plays nicely with Java double arrays:
(defn reshape-time-test
[]
(let [n-rows 100
n-cols 1000
src-array (double-array (* n-rows n-cols))]
(println "reshape time")
(time (dotimes [idx 100]
(m/reshape src-array [n-rows n-cols])))
(println "vectorz time")
(time (dotimes [idx 100]
(m/reshape (array :vectorz src-array) [n-rows n-cols])))))
#'mikera.vectorz.matrix-api/reshape-time-test
=> (reshape-time-test)
reshape time
"Elapsed time: 294872.994923 msecs"
vectorz time
"Elapsed time: 49.254672 msecs"