mleap icon indicating copy to clipboard operation
mleap copied to clipboard

org.apache.spark.sql.mleap.TypeConverters can not convert 2D tensor to Matrix

Open austinzh opened this issue 1 year ago • 0 comments

Current implementation always will convert Tensor to Vector Bug is hidden in tt.dimensions.size where tt.dimensions is Option[Seq[Int]], so calling size on Some will have size of 1 and calling size on None will have size of 0. So in following code, TensorType will always convert to VectorUDT

  def mleapTensorToSpark(tt: types.TensorType): DataType = {
    assert(TypeConverters.VECTOR_BASIC_TYPES.contains(tt.base),
      s"cannot convert tensor with base ${tt.base} to vector")
    assert(tt.dimensions.isDefined, "cannot convert tensor with undefined dimensions")

    if(tt.dimensions.isEmpty) {
      mleapBasicTypeToSparkType(tt.base)
    } else if(tt.dimensions.size == 1) {
      new VectorUDT
    } else if(tt.dimensions.size == 2) {
      new MatrixUDT
    } else {
      throw new IllegalArgumentException("cannot convert tensor for non-scalar, vector or matrix tensor")
    }
  }

Same bug exists in mleapToSparkValue function as well.

austinzh avatar Jun 27 '23 19:06 austinzh