Mobius icon indicating copy to clipboard operation
Mobius copied to clipboard

User defined struct function not supported yet

Open dylangmiles opened this issue 7 years ago • 1 comments

I am trying to use the struct UDF and I get an exception specifying it is not yet supported. I have found a work around using sql, but is there a recommendation as to which technique is better?

And are you planning to implement the struct UDF at some point?

var schema = new StructType(new List<StructField>()
{
    new StructField("id", new IntegerType()),
    new StructField("foo", new StringType()),
    new StructField("bar", new StringType()),
});

var data = _sparkContext.Parallelize(new List<object[]>()
{
    new object[] {1, "a", "b" },
    new object[] {2, "x", "y"},
    new object[] {3, "c", "d"},
});

var df = _sqlContext.CreateDataFrame(data, schema);

df.Show(10);


//Option1: sql
var option1 = df.SelectExpr("id", "named_struct('foo1', foo, 'bar2', bar)");

option1.Show(10);



//Option2: Column expressions
var option2 = df.Select(df["id"], Functions.Struct(df["foo"].Alias("foo2"), df["bar"].Alias("bar2")));
option2.Show(10);

DataAggregatorService.Test.AssessmentAggregatorTest.Test_UserDefinedFunctions threw exception: 
System.NotSupportedException: Type System.Linq.Enumerable+WhereSelectArrayIterator`2[Microsoft.Spark.CSharp.Sql.Column,Microsoft.Spark.CSharp.Proxy.IColumnProxy] not supported yet
    at Microsoft.Spark.CSharp.Interop.Ipc.PayloadHelper.GetTypeId(Type type)
   at Microsoft.Spark.CSharp.Interop.Ipc.PayloadHelper.ConvertParametersToBytes(Object[] parameters, Boolean addTypeIdPrefix)
   at Microsoft.Spark.CSharp.Interop.Ipc.PayloadHelper.BuildPayload(Boolean isStaticMethod, Object classNameOrJvmObjectReference, String methodName, Object[] parameters)
   at Microsoft.Spark.CSharp.Interop.Ipc.JvmBridge.CallJavaMethod(Boolean isStatic, Object classNameOrJvmObjectReference, String methodName, Object[] parameters)
   at Microsoft.Spark.CSharp.Interop.Ipc.JvmBridge.CallStaticJavaMethod(String className, String methodName, Object[] parameters)
   at Microsoft.Spark.CSharp.Proxy.Ipc.SparkContextIpcProxy.CreateFunction(String name, Object self)
   at Microsoft.Spark.CSharp.Sql.Functions.Struct(Column[] columns)
   at DataAggregatorService.Test.AssessmentAggregatorTest.Test_UserDefinedFunctions() in C:\_sourceCode2005\CCI\DigiTRACC\DigiTRACC 6.0\DataAggregatorService.Test\AggregatorTest.cs:line 119


[2017-07-24T13:19:34.3296329Z] [ATHENA-DESKTOP] [Info] [SparkContext] Parallelizing 3 items to form RDD in the cluster with 1 partitions
[2017-07-24T13:19:34.5136299Z] [ATHENA-DESKTOP] [Info] [RDD`1] Executing Map operation on RDD (preservesPartitioning=False)
[2017-07-24T13:19:36.0396218Z] [ATHENA-DESKTOP] [Info] [DataFrame] Writing 10 rows in the DataFrame to Console output
+---+---+---+
| id|foo|bar|
+---+---+---+
|  1|  a|  b|
|  2|  x|  y|
|  3|  c|  d|
+---+---+---+

[2017-07-24T13:19:37.1106129Z] [ATHENA-DESKTOP] [Info] [DataFrame] Writing 10 rows in the DataFrame to Console output
+---+----------------------------------+
| id|named_struct(foo1, foo, bar2, bar)|
+---+----------------------------------+
|  1|                             [a,b]|
|  2|                             [x,y]|
|  3|                             [c,d]|
+---+----------------------------------+

[2017-07-24T13:19:37.5746108Z] [ATHENA-DESKTOP] [Exception] [JvmBridge] Type System.Linq.Enumerable+WhereSelectArrayIterator`2[Microsoft.Spark.CSharp.Sql.Column,Microsoft.Spark.CSharp.Proxy.IColumnProxy] not supported yet
   at Microsoft.Spark.CSharp.Interop.Ipc.PayloadHelper.GetTypeId(Type type)
   at Microsoft.Spark.CSharp.Interop.Ipc.PayloadHelper.ConvertParametersToBytes(Object[] parameters, Boolean addTypeIdPrefix)
   at Microsoft.Spark.CSharp.Interop.Ipc.PayloadHelper.BuildPayload(Boolean isStaticMethod, Object classNameOrJvmObjectReference, String methodName, Object[] parameters)
   at Microsoft.Spark.CSharp.Interop.Ipc.JvmBridge.CallJavaMethod(Boolean isStatic, Object classNameOrJvmObjectReference, String methodName, Object[] parameters)

dylangmiles avatar Jul 24 '17 13:07 dylangmiles

UDF is not recommended because of per overhead in JVM-CLR interop. SQL is a better option.

skaarthik avatar Sep 15 '17 17:09 skaarthik