spark icon indicating copy to clipboard operation
spark copied to clipboard

Question: How to use DataFrame API to achieve the function equivalent to map/reduce in spark.net

Open JunweiSUN opened this issue 1 year ago • 0 comments

Hi, we have a scenario that need to use the map/reduce function in spark.net, For example, we want to call

public IEnumerable<object[]> MapCallback(IEnumerable<Row> input)
{
   // do something with `IEnumerable<Row> input`
}

df.Rdd.MapPartitions(MapCallback, true)

The thing is, we need this IEnumerable<Row> input so that we can do some operation on the row level. In Mobius, we can access all the Rdd-related APIs, but according to this issue seems all the Rdd-related APIs are no longer accessible.

So we have the following questions:

  1. Is there any API in current Spark.Net that can implement the function that exactly equivalent to Rdd.Map, Rdd.Reduce and other mapreduce related function? Note that we need to deal with a IEnumerable<Row> with arbitrary number of elements in one row, i.e., we may not know how many elements (columns) in a row until runtime.
  2. If the answer of 1 is false, can we just download the source code, change the visibility of Rdd-related APIs to public, and build a private bits to use?
  3. Any other related suggestions will be really appreciated.

Looking forward to your answer! Thanks a lot!

JunweiSUN avatar Aug 03 '23 13:08 JunweiSUN