spark icon indicating copy to clipboard operation
spark copied to clipboard

Error from many DataFrame methods after UDF called in DataFrame.WithColumn

Open dogulas-accip opened this issue 2 years ago • 1 comments

I'm a long time C# programmer but just getting my feet wet with .Net for Apache Spark. Following many "getting started" instructions and videos, I installed:

7-Zip Java 8 I downloaded Apache Spark from https://spark.apache.org/downloads.html .NET for Apache Spark v2.1.1 WinUtils.exe I'm running this on Window 10

Problem: When I call DataFrame.Show() after doing a DataFrame.WithColumn() using a UDF, I always get an error: [2023-02-07T15:45:31.3903664Z] [DESKTOP-H37P8Q0] [Error] [TaskRunner] [0] ProcessStream() failed with exception: System.ArgumentNullException: Value cannot be null. Parameter name: type

TestCases.csv looks like this: TestCases.csv

OrderList.csv looks like this: OrderList.csv

Here is the Program class of the TestSparkApp console project: Program.cs.txt and supporting classes: Player.cs.txt Collector.cs.txt

Here is the output of the above app: TestSpartAppOutput.txt

Note that the same bug will appear executing many different methods on the DataFrame object but only after a call to the WithColumn method using a UDF. In this case, the code looks like this:

          // user defined function
           Func<Column, Column, Column> GetSubst = Udf<string, string, int>(
               (strOrder, strPlayers) =>
               {
                   return GetSubstance(strOrder, strPlayers);
               });

           // call the user defined function and add a new column to the dataframe
           ordersFrame = ordersFrame.WithColumn("substance", GetSubst(ordersFrame["names"], ordersFrame["players"]).Cast("Integer"));

           // *** This is where the error will be thrown, but if I comment it out, the same error will be thrown later
           // print out the data
           ordersFrame.Show(20, 20, false);

however, I've tried it with other UDFs followed by other DataFrame method calls and I always get the same error. In the Main() function, you will see a later foreach loop. If I comment out the ordersFrame.Show() call, and comment in the contents of the loop, I will get the same error when I access row.Values[0].ToString().

I wonder if I have missed something in my installation?

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser n/a
  • Version see above

dogulas-accip avatar Feb 08 '23 16:02 dogulas-accip

Well, it has been 5 days and I'm getting crickets.

I noticed that other questions have no responses after long periods of times and those that have any responses have had to wait weeks if not months.

Should I interpret this to mean that .NET for Apache Spark is sundowned and no longer supported? Is this a dead product and we should not incorporate it in new development?

Thanks, dogulas-accip

dogulas-accip avatar Feb 13 '23 14:02 dogulas-accip