SynapseML
SynapseML copied to clipboard
Does mmlspark support running on yarn?
if i con't run lighgbm on yarn. the original version rather than the spark version may be the best choice
@janelu9 yes it should, hope that resolves your question. Is this question specifically related to one of the estimators (eg LIME, lightgbm, KNN, CNTKModel etc?)
@janelu9 yes it should, hope that resolves your question. Is this question specifically related to one of the estimators (eg LIME, lightgbm, KNN, CNTKModel etc?)
I cont import mmlspark in pyspark if the master is yarn.
the command is like:
pyspark --master yarn --jars file:///root/.ivy2/jars/*
But I can succeed in spark-shell:
spark-shell --master yarn --jars file:///root/.ivy2/jars/*
How can I import mmlspark successful if I just have jars and con't connect to the Internet?
root@DESKTOP-OPMDKT7:~/.ivy2# spark-shell --master yarn --jars file:///root/.ivy2/jars/* Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = yarn, app id = application_1578918883660_0002). Spark session available as 'spark'. Welcome to ____ __ / / ___ / / \ / _ / _ `/ __/ '/ // .__/_,// //_\ version 2.3.1 //
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_172) Type in expressions to have them evaluated. Type :help for more information.
scala>
scala> import com.microsoft.ml.spark.lightgbm._ import com.microsoft.ml.spark.lightgbm._
root@DESKTOP-OPMDKT7:~/.ivy2# pyspark --master yarn --jars file:///root/.ivy2/jars/* Python 3.7.0 (default, Jun 28 2018, 13:15:42) [GCC 7.2.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to ____ __ / / ___ / / \ / _ / _ `/ __/ '/ / / ._/_,// //_\ version 2.3.1 //
Using Python version 3.7.0 (default, Jun 28 2018 13:15:42) SparkSession available as 'spark'.
from mmlspark.lightgbm import LightGBMRegressor as lgb Traceback (most recent call last): File "
", line 1, in ModuleNotFoundError: No module named 'mmlspark'
@imatiach-msft I try to start pyspark by this way
pyspark --master yarn \
--conf spark.dist.pyFiles=file:///....jar \
--conf spark.submit.pyFiles=file:///....jar \
--conf spark.yarn.dist.jars=file:///....jar
It still con't be used normally,though mmlspark can be importted
Using Python version 3.7.0 (default, Jun 28 2018 13:15:42) SparkSession available as 'spark'.
>>> from mmlspark.lightgbm import LightGBMRegressor as lgb
>>> lgb()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/mnt/e/bash/opt/spark/python/pyspark/__init__.py", line 105, in wrapper
return func(self, **kwargs)
File "/tmp/spark-47f3ea34-af3b-4ab9-a424-882de6e9b7bf/userFiles-a7e893d1-bcdc-40e8-bf88-8c43fd10d5b0/com.microsoft.ml.spark_mmlspark_2.11-0.18.1.jar/mmlspark/lightgbm/_LightGBMRegressor.py", line 63, in __init__
File "/mnt/e/bash/opt/spark/python/pyspark/ml/wrapper.py", line 63, in _new_java_obj
return java_obj(*java_args)
TypeError: 'JavaPackage' object is not callable
>>>
Hi! I'm running into the same issue, any idea how to fix it?
I'm using python 3.7, Pyspark version 2.4.4, MMLSpark version 1.0.0

mmlspark 2.11-0.18.1
/usr/share/spark/python/pyspark/init.py in wrapper(self, *args, **kwargs) 108 raise TypeError("Method %s forces keyword arguments." % func.name) 109 self._input_kwargs = kwargs --> 110 return func(self, **kwargs) 111 return wrapper 112
/usr/local/lib/python3.6/site-packages/mmlspark/lightgbm/_LightGBMRegressor.py in init(self, alpha, baggingFraction, baggingFreq, baggingSeed, boostFromAverage, boostingType, categoricalSlotIndexes, categoricalSlotNames, defaultListenPort, earlyStoppingRound, featureFraction, featuresCol, initScoreCol, isProvideTrainingMetric, labelCol, lambdaL1, lambdaL2, learningRate, maxBin, maxDepth, minSumHessianInLeaf, modelString, numBatches, numIterations, numLeaves, objective, parallelism, predictionCol, timeout, tweedieVariancePower, useBarrierExecutionMode, validationIndicatorCol, verbosity, weightCol) 61 def init(self, alpha=0.9, baggingFraction=1.0, baggingFreq=0, baggingSeed=3, boostFromAverage=True, boostingType="gbdt", categoricalSlotIndexes=None, categoricalSlotNames=None, defaultListenPort=12400, earlyStoppingRound=0, featureFraction=1.0, featuresCol="features", initScoreCol=None, isProvideTrainingMetric=False, labelCol="label", lambdaL1=0.0, lambdaL2=0.0, learningRate=0.1, maxBin=255, maxDepth=-1, minSumHessianInLeaf=0.001, modelString="", numBatches=0, numIterations=100, numLeaves=31, objective="regression", parallelism="data_parallel", predictionCol="prediction", timeout=1200.0, tweedieVariancePower=1.5, useBarrierExecutionMode=False, validationIndicatorCol=None, verbosity=1, weightCol=None): 62 super(_LightGBMRegressor, self).init() ---> 63 self._java_obj = self._new_java_obj("com.microsoft.ml.spark.lightgbm.LightGBMRegressor") 64 self.alpha = Param(self, "alpha", "alpha: parameter for Huber loss and Quantile regression (default: 0.9)") 65 self._setDefault(alpha=0.9)
/usr/share/spark/python/pyspark/ml/wrapper.py in _new_java_obj(java_class, *args) 65 java_obj = getattr(java_obj, name) 66 java_args = [_py2java(sc, arg) for arg in args] ---> 67 return java_obj(*java_args) 68 69 @staticmethod
TypeError: 'JavaPackage' object is not callable
Hi! I'm running into the same issue, any idea how to fix it?
I'm using python 3.7, Pyspark version 2.4.4, MMLSpark version 1.0.0
![]()
Hi! I have the same problem now, did you fix it?
@1120172175 this looks like some issue with the cluster configuration? How are you specifying the mmlspark package? Sorry I haven't seen this issue before, I'm not sure how to resolve it.
I wonder if this is helpful, but maybe it's not helpful at all: https://stackoverflow.com/questions/41112801/property-spark-yarn-jars-how-to-deal-with-it
I found by hit-n-trial that correct syntax of this property is
spark.yarn.jars=hdfs://xx:9000/user/spark/share/lib/*.jar
what are the full file paths you are using to specify the jars?