nx icon indicating copy to clipboard operation
nx copied to clipboard

EXLA error for large tensor

Open msluszniak opened this issue 1 year ago • 1 comments

The following code

res =
  EXLA.jit(&Scholar.Manifold.MDS.fit(&1, key: &2, num_components: 2)).(
    Nx.iota({1000000, 3}),
    Nx.Random.key(42)
  )

fives an error:

** (RuntimeError) Unable to get dimensions.
    (exla 0.7.2) lib/exla/shape.ex:89: EXLA.Shape.unwrap!/1
    (exla 0.7.2) lib/exla/shape.ex:29: EXLA.Shape.make_shape/2
    (exla 0.7.2) lib/exla/defn.ex:914: EXLA.Defn.to_operator/4
    (exla 0.7.2) lib/exla/defn.ex:898: EXLA.Defn.cached_recur_operator/4
    (exla 0.7.2) lib/exla/defn.ex:657: EXLA.Defn.recur_operator/3
    (exla 0.7.2) lib/exla/defn.ex:2425: EXLA.Defn.recur_composite/4
    (elixir 1.15.5) lib/enum.ex:1819: Enum."-map_reduce/3-lists^mapfoldl/2-0-"/3
    #cell:lph4otuox3sqx2ec:2: (file)

For smaller tensors like Nx.iota({1000, 3}) the error does not occur.

msluszniak avatar Jun 06 '24 12:06 msluszniak

We killed the EXLA.Shape module where this is happening. Does this still occur on main? That error would only fire if there was an issue getting a value from the dimensions tuple. The logic calls enif_get_tuple and then cycles through enif_get_int64 calls. The only way it would fail is if the integer is out of bounds of the type, which it is not - so that's confusing.

seanmor5 avatar Jun 06 '24 23:06 seanmor5

I think this had even been fixed before by me somewhere along the way, and was basically a wrong integer size in the code path of creating EXLA.Shapes

polvalente avatar Jul 11 '24 05:07 polvalente