spark-deep-learning icon indicating copy to clipboard operation
spark-deep-learning copied to clipboard

KerasImageFileEstimator api cannot work with dataset as explained in keras_image_file_estimator.py

Open demetsude opened this issue 7 years ago • 1 comments
trafficstars

Hi, I am using sparkdl module from databricks. I am trying to run an application using KerasImageFileEstimator. I am using the example explained in the keras_image_file_estimator.py which creates a dataset by stringIndexer = StringIndexer(inputCol="imageLabel", outputCol="categoryIndex") indexed_dateset = stringIndexer.fit(original_dataset).transform(original_dataset) encoder = OneHotEncoder(inputCol="categoryIndex", outputCol="categoryVec") image_dataset = encoder.transform(indexed_dateset) I am getting error when I run transformers = estimator.fit(image_dataset) and the error is _keras_label = row[label_col].array AttributeError: 'SparseVector' object has no attribute 'array' As far as I understand, the problem is OneHotEncoder returns a SparseVector (categoryVec) and SparseVector which is row[label_col] here does not have an attribute called array. Error raised from the _getNumpyFeaturesAndLabels function in keras_image_file_estimator.py.

I could not find a solution to this. So if you can help me, I would be glad.

demetsude avatar Mar 27 '18 11:03 demetsude

Thanks for raising this issue! Based on the context and error message, we think that https://github.com/databricks/spark-deep-learning/pull/125 should fix it. If you need an immediate work around, you might have to ensure that the column represented by labelCol Param of your estimator ("categoryVector" ?) is a DenseVector by applying some custom udf to it.

yogeshg avatar May 03 '18 00:05 yogeshg