Mobius
Mobius copied to clipboard
User defined struct function not supported yet
I am trying to use the struct UDF and I get an exception specifying it is not yet supported. I have found a work around using sql, but is there a recommendation as to which technique is better?
And are you planning to implement the struct UDF at some point?
var schema = new StructType(new List<StructField>()
{
new StructField("id", new IntegerType()),
new StructField("foo", new StringType()),
new StructField("bar", new StringType()),
});
var data = _sparkContext.Parallelize(new List<object[]>()
{
new object[] {1, "a", "b" },
new object[] {2, "x", "y"},
new object[] {3, "c", "d"},
});
var df = _sqlContext.CreateDataFrame(data, schema);
df.Show(10);
//Option1: sql
var option1 = df.SelectExpr("id", "named_struct('foo1', foo, 'bar2', bar)");
option1.Show(10);
//Option2: Column expressions
var option2 = df.Select(df["id"], Functions.Struct(df["foo"].Alias("foo2"), df["bar"].Alias("bar2")));
option2.Show(10);
DataAggregatorService.Test.AssessmentAggregatorTest.Test_UserDefinedFunctions threw exception:
System.NotSupportedException: Type System.Linq.Enumerable+WhereSelectArrayIterator`2[Microsoft.Spark.CSharp.Sql.Column,Microsoft.Spark.CSharp.Proxy.IColumnProxy] not supported yet
at Microsoft.Spark.CSharp.Interop.Ipc.PayloadHelper.GetTypeId(Type type)
at Microsoft.Spark.CSharp.Interop.Ipc.PayloadHelper.ConvertParametersToBytes(Object[] parameters, Boolean addTypeIdPrefix)
at Microsoft.Spark.CSharp.Interop.Ipc.PayloadHelper.BuildPayload(Boolean isStaticMethod, Object classNameOrJvmObjectReference, String methodName, Object[] parameters)
at Microsoft.Spark.CSharp.Interop.Ipc.JvmBridge.CallJavaMethod(Boolean isStatic, Object classNameOrJvmObjectReference, String methodName, Object[] parameters)
at Microsoft.Spark.CSharp.Interop.Ipc.JvmBridge.CallStaticJavaMethod(String className, String methodName, Object[] parameters)
at Microsoft.Spark.CSharp.Proxy.Ipc.SparkContextIpcProxy.CreateFunction(String name, Object self)
at Microsoft.Spark.CSharp.Sql.Functions.Struct(Column[] columns)
at DataAggregatorService.Test.AssessmentAggregatorTest.Test_UserDefinedFunctions() in C:\_sourceCode2005\CCI\DigiTRACC\DigiTRACC 6.0\DataAggregatorService.Test\AggregatorTest.cs:line 119
[2017-07-24T13:19:34.3296329Z] [ATHENA-DESKTOP] [Info] [SparkContext] Parallelizing 3 items to form RDD in the cluster with 1 partitions
[2017-07-24T13:19:34.5136299Z] [ATHENA-DESKTOP] [Info] [RDD`1] Executing Map operation on RDD (preservesPartitioning=False)
[2017-07-24T13:19:36.0396218Z] [ATHENA-DESKTOP] [Info] [DataFrame] Writing 10 rows in the DataFrame to Console output
+---+---+---+
| id|foo|bar|
+---+---+---+
| 1| a| b|
| 2| x| y|
| 3| c| d|
+---+---+---+
[2017-07-24T13:19:37.1106129Z] [ATHENA-DESKTOP] [Info] [DataFrame] Writing 10 rows in the DataFrame to Console output
+---+----------------------------------+
| id|named_struct(foo1, foo, bar2, bar)|
+---+----------------------------------+
| 1| [a,b]|
| 2| [x,y]|
| 3| [c,d]|
+---+----------------------------------+
[2017-07-24T13:19:37.5746108Z] [ATHENA-DESKTOP] [Exception] [JvmBridge] Type System.Linq.Enumerable+WhereSelectArrayIterator`2[Microsoft.Spark.CSharp.Sql.Column,Microsoft.Spark.CSharp.Proxy.IColumnProxy] not supported yet
at Microsoft.Spark.CSharp.Interop.Ipc.PayloadHelper.GetTypeId(Type type)
at Microsoft.Spark.CSharp.Interop.Ipc.PayloadHelper.ConvertParametersToBytes(Object[] parameters, Boolean addTypeIdPrefix)
at Microsoft.Spark.CSharp.Interop.Ipc.PayloadHelper.BuildPayload(Boolean isStaticMethod, Object classNameOrJvmObjectReference, String methodName, Object[] parameters)
at Microsoft.Spark.CSharp.Interop.Ipc.JvmBridge.CallJavaMethod(Boolean isStatic, Object classNameOrJvmObjectReference, String methodName, Object[] parameters)
UDF is not recommended because of per overhead in JVM-CLR interop. SQL is a better option.