SynapseML icon indicating copy to clipboard operation
SynapseML copied to clipboard

Does mmlspark support running on yarn?

Open janelu9 opened this issue 5 years ago • 10 comments

if i con't run lighgbm on yarn. the original version rather than the spark version may be the best choice

janelu9 avatar Jan 08 '20 01:01 janelu9

@janelu9 yes it should, hope that resolves your question. Is this question specifically related to one of the estimators (eg LIME, lightgbm, KNN, CNTKModel etc?)

imatiach-msft avatar Jan 13 '20 06:01 imatiach-msft

@janelu9 yes it should, hope that resolves your question. Is this question specifically related to one of the estimators (eg LIME, lightgbm, KNN, CNTKModel etc?)

I cont import mmlspark in pyspark if the master is yarn. the command is like: pyspark --master yarn --jars file:///root/.ivy2/jars/* But I can succeed in spark-shell: spark-shell --master yarn --jars file:///root/.ivy2/jars/* How can I import mmlspark successful if I just have jars and con't connect to the Internet?

janelu9 avatar Jan 13 '20 11:01 janelu9

root@DESKTOP-OPMDKT7:~/.ivy2# spark-shell --master yarn --jars file:///root/.ivy2/jars/* Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = yarn, app id = application_1578918883660_0002). Spark session available as 'spark'. Welcome to ____ __ / / ___ / / \ / _ / _ `/ __/ '/ // .__/_,// //_\ version 2.3.1 //

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_172) Type in expressions to have them evaluated. Type :help for more information.

scala>

scala> import com.microsoft.ml.spark.lightgbm._ import com.microsoft.ml.spark.lightgbm._

root@DESKTOP-OPMDKT7:~/.ivy2# pyspark --master yarn --jars file:///root/.ivy2/jars/* Python 3.7.0 (default, Jun 28 2018, 13:15:42) [GCC 7.2.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to ____ __ / / ___ / / \ / _ / _ `/ __/ '/ / / ._/_,// //_\ version 2.3.1 //

Using Python version 3.7.0 (default, Jun 28 2018 13:15:42) SparkSession available as 'spark'.

from mmlspark.lightgbm import LightGBMRegressor as lgb Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'mmlspark'

janelu9 avatar Jan 13 '20 12:01 janelu9

@imatiach-msft I try to start pyspark by this way

pyspark --master yarn \
--conf spark.dist.pyFiles=file:///....jar \
--conf spark.submit.pyFiles=file:///....jar \
--conf spark.yarn.dist.jars=file:///....jar

It still con't be used normally,though mmlspark can be importted

Using Python version 3.7.0 (default, Jun 28 2018 13:15:42) SparkSession available as 'spark'.

>>> from mmlspark.lightgbm import LightGBMRegressor as lgb
>>> lgb()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/e/bash/opt/spark/python/pyspark/__init__.py", line 105, in wrapper
    return func(self, **kwargs)
  File "/tmp/spark-47f3ea34-af3b-4ab9-a424-882de6e9b7bf/userFiles-a7e893d1-bcdc-40e8-bf88-8c43fd10d5b0/com.microsoft.ml.spark_mmlspark_2.11-0.18.1.jar/mmlspark/lightgbm/_LightGBMRegressor.py", line 63, in __init__
  File "/mnt/e/bash/opt/spark/python/pyspark/ml/wrapper.py", line 63, in _new_java_obj
    return java_obj(*java_args)
TypeError: 'JavaPackage' object is not callable
>>>   

janelu9 avatar Jan 13 '20 13:01 janelu9

Hi! I'm running into the same issue, any idea how to fix it?

I'm using python 3.7, Pyspark version 2.4.4, MMLSpark version 1.0.0

Screenshot 2020-07-17 at 10 10 42

pedromcvaz avatar Jul 17 '20 09:07 pedromcvaz

mmlspark 2.11-0.18.1

/usr/share/spark/python/pyspark/init.py in wrapper(self, *args, **kwargs) 108 raise TypeError("Method %s forces keyword arguments." % func.name) 109 self._input_kwargs = kwargs --> 110 return func(self, **kwargs) 111 return wrapper 112

/usr/local/lib/python3.6/site-packages/mmlspark/lightgbm/_LightGBMRegressor.py in init(self, alpha, baggingFraction, baggingFreq, baggingSeed, boostFromAverage, boostingType, categoricalSlotIndexes, categoricalSlotNames, defaultListenPort, earlyStoppingRound, featureFraction, featuresCol, initScoreCol, isProvideTrainingMetric, labelCol, lambdaL1, lambdaL2, learningRate, maxBin, maxDepth, minSumHessianInLeaf, modelString, numBatches, numIterations, numLeaves, objective, parallelism, predictionCol, timeout, tweedieVariancePower, useBarrierExecutionMode, validationIndicatorCol, verbosity, weightCol) 61 def init(self, alpha=0.9, baggingFraction=1.0, baggingFreq=0, baggingSeed=3, boostFromAverage=True, boostingType="gbdt", categoricalSlotIndexes=None, categoricalSlotNames=None, defaultListenPort=12400, earlyStoppingRound=0, featureFraction=1.0, featuresCol="features", initScoreCol=None, isProvideTrainingMetric=False, labelCol="label", lambdaL1=0.0, lambdaL2=0.0, learningRate=0.1, maxBin=255, maxDepth=-1, minSumHessianInLeaf=0.001, modelString="", numBatches=0, numIterations=100, numLeaves=31, objective="regression", parallelism="data_parallel", predictionCol="prediction", timeout=1200.0, tweedieVariancePower=1.5, useBarrierExecutionMode=False, validationIndicatorCol=None, verbosity=1, weightCol=None): 62 super(_LightGBMRegressor, self).init() ---> 63 self._java_obj = self._new_java_obj("com.microsoft.ml.spark.lightgbm.LightGBMRegressor") 64 self.alpha = Param(self, "alpha", "alpha: parameter for Huber loss and Quantile regression (default: 0.9)") 65 self._setDefault(alpha=0.9)

/usr/share/spark/python/pyspark/ml/wrapper.py in _new_java_obj(java_class, *args) 65 java_obj = getattr(java_obj, name) 66 java_args = [_py2java(sc, arg) for arg in args] ---> 67 return java_obj(*java_args) 68 69 @staticmethod

TypeError: 'JavaPackage' object is not callable

rominokun avatar Aug 28 '20 07:08 rominokun

Hi! I'm running into the same issue, any idea how to fix it?

I'm using python 3.7, Pyspark version 2.4.4, MMLSpark version 1.0.0

Screenshot 2020-07-17 at 10 10 42

Hi! I have the same problem now, did you fix it?

1120172175 avatar May 26 '21 02:05 1120172175

@1120172175 this looks like some issue with the cluster configuration? How are you specifying the mmlspark package? Sorry I haven't seen this issue before, I'm not sure how to resolve it.

imatiach-msft avatar May 26 '21 04:05 imatiach-msft

I wonder if this is helpful, but maybe it's not helpful at all: https://stackoverflow.com/questions/41112801/property-spark-yarn-jars-how-to-deal-with-it

I found by hit-n-trial that correct syntax of this property is
spark.yarn.jars=hdfs://xx:9000/user/spark/share/lib/*.jar

imatiach-msft avatar May 26 '21 04:05 imatiach-msft

what are the full file paths you are using to specify the jars?

imatiach-msft avatar May 26 '21 04:05 imatiach-msft