clojisr icon indicating copy to clipboard operation
clojisr copied to clipboard

special handling of "tensor" ?

Open behrica opened this issue 1 year ago • 12 comments

It would be nice, if this would work:

(require '[tech.v3.dataset :as ds]
         '[tech.v3.dataset.tensor :as dst])

(def tensor
  (-> (ds/->dataset {:x (range 5)
                     :y (range 7 12)})
      (dst/dataset->tensor)
      ))

and then

(r.base/t (r/clj->r tensor))

(It does something, but not the right thing)

[[1]]
[1] 0 7

[[2]]
[1] 1 8

[[3]]
[1] 2 9

[[4]]
[1]  3 10

[[5]]
[1]  4 11


clj꞉clojisr.v1.r-test꞉> 
     [,1]      [,2]      [,3]      [,4]      [,5]     
[1,] numeric,2 numeric,2 numeric,2 numeric,2 numeric,2

or even better, that this does the right thing:

(r.base/t tensor)

I think we have similar special handling for tech.v3.datasets, maybe we should do the same for tech.v3.tensor

behrica avatar May 10 '24 15:05 behrica

these don't work neither:

(r.base/matrix (r/clj->r tensor))
(r.base/matrix tensor)

behrica avatar May 10 '24 15:05 behrica

For reference, it does work like this:

(require '[tech.v3.tensor :as tens]
         '[tech.v3.datatype :as dtt])

(-> tensor
    tens/tensor->buffer
    (r.base/matrix 
     :nrow (first (dtt/shape tensor))
     :ncol (second (dtt/shape tensor))
     )
    r.base/t
    )

behrica avatar May 10 '24 15:05 behrica

Some remarks:

  • tech.v3.tensor tensors multi dimensional, R matrices 2D not all tensors can be converted
  • tensors are implementations of https://github.com/cnuernber/dtype-next/blob/master/java/tech/v3/datatype/NDBuffer.java so maybe the integration could / should be based on this interface

behrica avatar May 10 '24 15:05 behrica

Transfer from Clojure to R should go through Java RServe library structures which I believe is an optimal route. Here is how it's done for TMD: https://github.com/scicloj/clojisr/blob/master/src/clojisr/v1/impl/clj_to_java.clj#L32-L48

Possible it can be done similarly for tensors as well.

genmeblog avatar May 10 '24 16:05 genmeblog

Tensors in R can be represented as multidimensional arrays not matrices. Here is something done in the past (it's a transfer of flat data into 5d array): https://scicloj.github.io/clojisr/clojisr.v1.tutorials.dataset.html#matrices-arrays-multidimensional-arrays

genmeblog avatar May 10 '24 16:05 genmeblog

Multidimensional arrays / tables in R are represented as flatten dataset on the Clojure side, like this 3d table: https://scicloj.github.io/clojisr/clojisr.v1.tutorials.dataset.html#table

genmeblog avatar May 10 '24 16:05 genmeblog

Ok. I learned indeed that dtype tensors can in R be represented as matrix or array Doing 'class` on a 3 D array in R gives;

> class(array(1:(3 * 4 * 5),dim=(c(3,4,5))))
[1] "array"
> 

while on "matrix" it gives:

> class(matrix(c(1,2,3,4)))
[1] "matrix" "array" 

Using "array" on 2D data gives as well a matrix:

> class(array(1:(3 * 4),dim=(c(3,4))))
[1] "matrix" "array" 

behrica avatar May 12 '24 10:05 behrica

Take a look at this line and below which converts multidimensional structure to flattened dataset. We can add another path to create tensors out of arrays. https://github.com/scicloj/clojisr/blob/master/src/clojisr/v1/impl/java_to_clj.clj#L94

genmeblog avatar May 12 '24 10:05 genmeblog

yes, will do. To me this is specially unexpected / could be improved by return a proper tensor

(->
 (r.base/array (range (* 3 4 5)) :dim [3 4 5])
 (r/r->clj)
 )
;; => _unnamed [15 5]:
;;    
;;    | :$col-0 |  1 |  2 |  3 |  4 |
;;    |--------:|---:|---:|---:|---:|
;;    |       1 |  0 |  3 |  6 |  9 |
;;    |       1 |  1 |  4 |  7 | 10 |
;;    |       1 |  2 |  5 |  8 | 11 |
;;    |       2 | 12 | 15 | 18 | 21 |
;;    |       2 | 13 | 16 | 19 | 22 |
;;    |       2 | 14 | 17 | 20 | 23 |
;;    |       3 | 24 | 27 | 30 | 33 |
;;    |       3 | 25 | 28 | 31 | 34 |
;;    |       3 | 26 | 29 | 32 | 35 |
;;    |       4 | 36 | 39 | 42 | 45 |
;;    |       4 | 37 | 40 | 43 | 46 |
;;    |       4 | 38 | 41 | 44 | 47 |
;;    |       5 | 48 | 51 | 54 | 57 |
;;    |       5 | 49 | 52 | 55 | 58 |
;;    |       5 | 50 | 53 | 56 | 59 |

behrica avatar May 12 '24 10:05 behrica

it represents a R 3D arrays as 2 2D data frame, (with an extra column per dimension)

behrica avatar May 12 '24 10:05 behrica

Yes, that was the idea. To make any nd-array into 2d dataset. I know this is not perfect solution. In that time tensors weren't available (or I was not aware of it)

genmeblog avatar May 12 '24 11:05 genmeblog

Yes, that was the idea. To make any nd-array into 2d dataset. I know this is not perfect solution. In that time tensors weren't available (or I was not aware of it)

I see, I started a discussion in zulip , lets continue there.

behrica avatar May 12 '24 11:05 behrica