spark-deep-learning
spark-deep-learning copied to clipboard
sparkdl.xgboost getting stuck trying to map partitions
I am running the following code to try to fit a model
from sparkdl.xgboost import XgboostClassifier
param = {
'num_workers': 4, # number of workers on the cluster, adjust as needed
'missing': 0,
"objective": "binary:logistic",
"eval_metric": "logloss",
'featuresCol':"features",
'labelCol':"objective",
'nthread':32 # equal to the number of cpus on each worker machine
}
train, test = data.randomSplit([0.001, 0.001])
xgb_classifier = XgboostClassifier(**param)
xgb_clf_model = xgb_classifier.fit(train)
When I run the model training on my databricks cluster is seems to be getting stuck when it is trying to map partitions. It is using almost zero cpu on each cluster but the memory usage is slowly increasing.
is there anything I can do to get around this issue