spark-deep-learning
spark-deep-learning copied to clipboard
Confusion regarding usage of pyspark.mlllib
In the examples you are using pyspark.ml library but what if I have to do the classification using SVM via the pyspark.mllib library . How can i implement the 'fit()' function then?
I was able to finally do it via the following code -
p = featurizer.transform(train_df)
data = p.rdd
##converting from dataframe to rdd data type
def as_mllib(v): if isinstance(v, ml_linalg.DenseVector): return MLLibVectors.dense(v.toArray()) else: raise TypeError("Unsupported type: {0}".format(type(v))) ''' Function to convert convert type <class 'pyspark.ml.linalg.DenseVector'> into Vector ""
data = data.map(lambda row: LabeledPoint(row.label, as_mllib(row.features))) model = SVMWithSGD.train(data, iterations=1)
Is there a simpler way to do this thing?
@jkbradley is there a simpler way?
@a7b23 the SVM interface is indeed not available yet in spark.ml
, but it will be in Spark 2.2.
@thunterdb So by converting the features to rdd data type and then applying svm over it am I compromising the performance in terms of speed ?