mrjob icon indicating copy to clipboard operation
mrjob copied to clipboard

allow Spark master to be specified with 'spark.master'

Open stug opened this issue 5 years ago • 5 comments

Currently the Spark runner expects that the Spark master will be passed with the --spark-master option. However, it also takes arbitrary Spark configuration properties in the form --jobconf PROP=VALUE. Spark allows the master to specified with the spark.master property, so the Spark runner should ideally understand --jobconf spark.master=MASTER as well.

stug avatar Mar 21 '19 22:03 stug

This actually applies to all runners, so updated the description.

coyotemarin avatar Mar 21 '19 23:03 coyotemarin

Need to check what happens if we pass conflicting --conf spark.master=... and --master=... to spark-submit.

coyotemarin avatar Mar 21 '19 23:03 coyotemarin

It looks like it's order-dependent; spark-submit basically treats --master=... as an alias for --conf spark.master=....

coyotemarin avatar Mar 22 '19 17:03 coyotemarin

I can see this potentially being an issue for some users, but for now we tell people that Spark master and deploy mode should be set explicitly (or that it's hard-coded for a particular runner).

coyotemarin avatar Mar 22 '19 17:03 coyotemarin

Probably the simplest way to implement this in mrjob is the opposite of the way spark-submit does it. Basically, when spark.master is in the dictionary for a jobconf opt, we override the spark_master opt as well.

If we want to be extra tidy, we can avoid setting --conf spark.master=... in spark-submit command lines, using --spark-master instead.

coyotemarin avatar Dec 09 '19 19:12 coyotemarin