Error from many DataFrame methods after UDF called in DataFrame.WithColumn
I'm a long time C# programmer but just getting my feet wet with .Net for Apache Spark. Following many "getting started" instructions and videos, I installed:
7-Zip Java 8 I downloaded Apache Spark from https://spark.apache.org/downloads.html .NET for Apache Spark v2.1.1 WinUtils.exe I'm running this on Window 10
Problem: When I call DataFrame.Show() after doing a DataFrame.WithColumn() using a UDF, I always get an error: [2023-02-07T15:45:31.3903664Z] [DESKTOP-H37P8Q0] [Error] [TaskRunner] [0] ProcessStream() failed with exception: System.ArgumentNullException: Value cannot be null. Parameter name: type
TestCases.csv looks like this: TestCases.csv
OrderList.csv looks like this: OrderList.csv
Here is the Program class of the TestSparkApp console project: Program.cs.txt and supporting classes: Player.cs.txt Collector.cs.txt
Here is the output of the above app: TestSpartAppOutput.txt
Note that the same bug will appear executing many different methods on the DataFrame object but only after a call to the WithColumn method using a UDF. In this case, the code looks like this:
// user defined function
Func<Column, Column, Column> GetSubst = Udf<string, string, int>(
(strOrder, strPlayers) =>
{
return GetSubstance(strOrder, strPlayers);
});
// call the user defined function and add a new column to the dataframe
ordersFrame = ordersFrame.WithColumn("substance", GetSubst(ordersFrame["names"], ordersFrame["players"]).Cast("Integer"));
// *** This is where the error will be thrown, but if I comment it out, the same error will be thrown later
// print out the data
ordersFrame.Show(20, 20, false);
however, I've tried it with other UDFs followed by other DataFrame method calls and I always get the same error. In the Main() function, you will see a later foreach loop. If I comment out the ordersFrame.Show() call, and comment in the contents of the loop, I will get the same error when I access row.Values[0].ToString().
I wonder if I have missed something in my installation?
Desktop (please complete the following information):
- OS: Windows 10
- Browser n/a
- Version see above
Well, it has been 5 days and I'm getting crickets.
I noticed that other questions have no responses after long periods of times and those that have any responses have had to wait weeks if not months.
Should I interpret this to mean that .NET for Apache Spark is sundowned and no longer supported? Is this a dead product and we should not incorporate it in new development?
Thanks, dogulas-accip