SynapseML
SynapseML copied to clipboard
code block LightGBMClassifier fit on yarn
Describe the bug
- code block in pipeline_model.fit(), No progress, spark stage always 0
- csv data : column 200, row 800000
- one centos compute train cost time: 1min
To Reproduce spark2-submit --master yarn --jars file:///root/.ivy2/jars/com.microsoft.ml.spark_mmlspark_2.11-1.0.0-rc1.jar,file:///root/.ivy2/jars/com.microsoft.ml.lightgbm_lightgbmlib-2.3.100.jar --conf spark.pyspark.python=/usr/lib/anaconda2/envs/mmlspark/bin/python --num-executors 20 --executor-memory 10G test_mmlspark2.py
Expected behavior A clear and concise description of what you expected to happen.
Info (please complete the following information):
- MMLSpark Version: [1.0.0rc]
- Spark Version [2.4.5]
- Spark Platform [CDH]
** Stacktrace**
client:
[Stage 6:> (0 + 20) / 20]
web:

yarn log:
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Info] Trying to bind port 12456...
[LightGBM] [Info] Binding port 12456 succeeded
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Info] Listening...
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Info] Connected to rank 0
[LightGBM] [Info] Connected to rank 1
[LightGBM] [Info] Connected to rank 2
[LightGBM] [Info] Connected to rank 3
[LightGBM] [Info] Connected to rank 4
[LightGBM] [Info] Connected to rank 5
[LightGBM] [Info] Connected to rank 6
[LightGBM] [Info] Connected to rank 7
[LightGBM] [Info] Connected to rank 8
[LightGBM] [Info] Connected to rank 9
[LightGBM] [Info] Connected to rank 10
[LightGBM] [Info] Connected to rank 11
[LightGBM] [Info] Connected to rank 12
[LightGBM] [Info] Connected to rank 13
[LightGBM] [Info] Connected to rank 14
[LightGBM] [Info] Connected to rank 15
[LightGBM] [Info] Connected to rank 16
[LightGBM] [Info] Connected to rank 17
[LightGBM] [Info] Connected to rank 19
[LightGBM] [Info] Local rank: 18, total number of machines: 20
[LightGBM] [Warning] metric is set=, metric= will be ignored. Current value: metric=
[LightGBM] [Warning] Starting from the 2.1.2 version, default value for the "boost_from_average" parameter in "binary" objective is true.
This may cause significantly different results comparing to the previous versions of LightGBM.
Try to set boost_from_average=false, if your old models produce bad results
[LightGBM] [Info] Number of positive: 173288, number of negative: 660791
[LightGBM] [Info] Total Bins 9954
[LightGBM] [Info] Number of data: 41899, number of used features: 196
[LightGBM] [Debug] Use subset for bagging
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.204635 -> initscore=-1.357573
[LightGBM] [Info] Start training from score -1.340715
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
[LightGBM] [Debug] Trained a tree with leaves = 8 and max_depth = 3
[LightGBM] [Debug] Re-bagging, using 29329 data to train
**python code**
# coding=UTF-8
import numpy as np
import pyspark
spark = pyspark.sql.SparkSession.builder.appName("spark lightgbm") \
.config("spark.jars.packages", "com.microsoft.ml.spark:mmlspark_2.11:0.18.1") \
.getOrCreate()
spark.conf.set("spark.executor.memory", '18g')
spark.conf.set("spark.executor.cores", '20')
spark.conf.set("park.default.parallelism", '300')
spark.conf.set("spark.cores.max", '30')
spark.conf.set("spark.driver.memory",'18g')
spark.conf.set("spark.yarn.executor.memoryOverhead",'10g')
import mmlspark
from mmlspark.lightgbm import LightGBMClassifier
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml import Pipeline
df_train = spark.read.format("csv") \
.option("inferSchema", "true") \
.option("header", "true") \
.option("sep", ",") \
.load("/model_data.csv")
feature_cols = list(df_train.columns)
feature_cols.remove("label")
assembler = VectorAssembler(inputCols=feature_cols, outputCol="features")
for colName in df_train.columns:
print(colName)
df_train = df_train.withColumn(colName, df_train[colName].cast('float'))
df_train = df_train.na.fill(0)
lgb = LightGBMClassifier(
objective="binary",
boostingType='gbdt',
isUnbalance=True,
featuresCol='features',
labelCol='label',
maxBin=60,
baggingFreq=1,
baggingSeed=696,
earlyStoppingRound=30,
learningRate=0.1,
lambdaL1=1.0,
lambdaL2=45.0,
maxDepth=3,
numLeaves=128,
baggingFraction=0.7,
featureFraction=0.7,
# minSumHessianInLeaf=1,
numIterations=800,
verbosity=30
)
stages = [assembler, lgb]
pipeline_model = Pipeline(stages=stages)
print("**********fit***************")
model = pipeline_model.fit(df_train)
print("**********transform***************")
train_preds = model.transform(df_train)
encountering same issue