spark-deep-learning icon indicating copy to clipboard operation
spark-deep-learning copied to clipboard

Confusion regarding usage of pyspark.mlllib

Open a7b23 opened this issue 7 years ago • 4 comments

In the examples you are using pyspark.ml library but what if I have to do the classification using SVM via the pyspark.mllib library . How can i implement the 'fit()' function then?

a7b23 avatar Jun 22 '17 12:06 a7b23

I was able to finally do it via the following code -

p = featurizer.transform(train_df) data = p.rdd
##converting from dataframe to rdd data type

def as_mllib(v): if isinstance(v, ml_linalg.DenseVector): return MLLibVectors.dense(v.toArray()) else: raise TypeError("Unsupported type: {0}".format(type(v))) ''' Function to convert convert type <class 'pyspark.ml.linalg.DenseVector'> into Vector ""

data = data.map(lambda row: LabeledPoint(row.label, as_mllib(row.features))) model = SVMWithSGD.train(data, iterations=1)

Is there a simpler way to do this thing?

a7b23 avatar Jun 22 '17 13:06 a7b23

@jkbradley is there a simpler way?

sueann avatar Jun 23 '17 20:06 sueann

@a7b23 the SVM interface is indeed not available yet in spark.ml, but it will be in Spark 2.2.

thunterdb avatar Jun 23 '17 23:06 thunterdb

@thunterdb So by converting the features to rdd data type and then applying svm over it am I compromising the performance in terms of speed ?

a7b23 avatar Jun 24 '17 12:06 a7b23