RemoteShuffleService icon indicating copy to clipboard operation
RemoteShuffleService copied to clipboard

spark 3.1/3.2?

Open cpd85 opened this issue 3 years ago • 10 comments

hi all, I saw there is a spark30 branch for spark 3.0.x supported in the readme. there also seems to be a spark31 branch but wondering is there any plans to support spark 3.2 or could it work out of the box with spark31 branch?

cpd85 avatar Feb 02 '22 00:02 cpd85

Yeah, agree it is confusing here. Spark 3.1 and 3.2 have slight difference in shuffle APIs, thus we need to change Remote Shuffle Service accordingly. I used to work on Remote Shuffle Service when I was in Uber. Now I left Uber, and do not have write access to this repo anymore.

What environment are you interested to run Remote Shuffle Service, e.g. YARN, or Kubernetes? If Kubernetes, I have some other repo to make Remote Shuffle Service compatible with Kubernetes for Spark 3.1 and 3.2.

hiboyang avatar Feb 06 '22 07:02 hiboyang

@hiboyang thanks for the response -- I really appreciate it! I think for now, would love to be able to run on YARN. Kubernetes I would love to explore as well. If you point me towards some repo/changes you made for compatibility, maybe I could extend it to run on YARN as well?

cpd85 avatar Feb 07 '22 23:02 cpd85

I see. In that case, you could change <spark.version>2.4.3</spark.version> in pom.xml to Spark 3 version. You will get some compile error, and you could start from there.

I tried to get some time to provide example, but really busy these days :(

hiboyang avatar Feb 10 '22 02:02 hiboyang

@hiboyang I am looking to deploy remote shuffle service in my kubernetes cluster, preferably for spark 3.1.1. What's your recommendation?

roligupt avatar Feb 18 '22 03:02 roligupt

Hi!

Support for spark 3.2 is very interesting is also required there java 11 I tried to change some parameters for spark 3.2, for example,

<java.version>11</java.version>
<hadoop.version>3.2.2</hadoop.version>
<spark.version>3.2.0</spark.version>
<scala.version>2.12.15</scala.version>

but I get an error

[ERROR] /home/alatau/ssk/3.2/src/main/scala/org/apache/spark/shuffle/rss/RssStressTool.scala:144: not enough arguments for method registerShuffle: (shuffleId: Int, numMaps: Int, numReduces: Int)Unit.
Unspecified value parameter numReduces.
[ERROR]     mapOutputTrackerMaster.registerShuffle(appShuffleId.getShuffleId, numMaps)
[ERROR]                                           ^
[ERROR] one error found
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE

avs-alatau avatar Apr 05 '22 12:04 avs-alatau

@avs-alatau as @hiboyang mentioned, there's a difference in APIs, so its not enough to just change the spark.version -- you'll need to implement the new APIs as well. Bo's done the work here but its only running on k8s at the moment : https://github.com/hiboyang/RemoteShuffleService/tree/k8s-spark-3.2

cpd85 avatar Apr 05 '22 15:04 cpd85

@cpd85 thanks for the link to k8s but at the moment it is possible to configure only for yarn

avs-alatau avatar Apr 05 '22 16:04 avs-alatau

@avs-alatau could you help me understand what you're asking for? The code doesn't exist or isn't open source for yarn. At the moment I'm working on fighting through these compilation issues to see if I can get a 3.2 client to communicate with a 2.4 server. I'll be happy to share the code if I end up getting it working

cpd85 avatar Apr 05 '22 17:04 cpd85

@cpd85 Thanks for the help. I have a hadoop cluster with spark 3.2 Now spark jobs are working through YARN and there are some problems with this because of which I am looking for an external Shuffle Service I managed to set up spark jobs on a test cluster for the spark 3.0 version, but due to the fact that spark 3.2 is installed in the industrial cluster, I am looking for an external Shuffle Service that will provide this opportunity If you manage to build an RSS version for spark 3.2, I will be grateful

avs-alatau avatar Apr 05 '22 18:04 avs-alatau

@avs-alatau haven't done too much testing but I got this to work with a spark3.2 page rank example app

https://github.com/cpd85/RemoteShuffleService/tree/spark32

cpd85 avatar Apr 11 '22 17:04 cpd85