DeepImageFeaturizer doesn't seem to work
I'm reading my images from a parquet file which I had previously created to contain the image path as well as the image following the imageSchema. I am using sparkdl version 0.2.0 and I am attempting to feed the images read from the parquet file (hosted on s3) into the DeepImage Featurizer, but I keep getting the following error:
AttributeError: 'NoneType' object has no attribute 'mode'
My code looks something like this:
test_image_df = sc.read.parquet("s3://<bucket_name>/test_images2.parquet").repartition(num_workers)
featurizer = DeepImageFeaturizer(inputCol="image", outputCol="features", modelName="ResNet50")
lr = LogisticRegression(maxIter=20, regParam=0.05, elasticNetParam=0.3, labelCol="label")
p = Pipeline(stages=[featurizer, lr])
model = p.fit(test_image_df)
@MrBago
Can you shed some light on this? I've been working on a project for a couple of months now and I'm hanging on this. I've traced the source files and nothing seems off. I've printed out the test_image_df schema as well as columns and rows so I'm sure the data has been correctly read.
The error traces back to this line:
File "./databricks_spark-deep-learning-0.2.0-spark2.1-s_2.11.jar/sparkdl/image/imageIO.py", line 111, in imageType
return sparkModeLookup[imageRow.mode]
I meet the same problem, anyone has the solutions?
My guess is that your image data includes non-image files or image files in an unsupported format. AttributeError: 'NoneType' object has no attribute 'mode' means that we're trying to call .mode on an None object. The line return sparkModeLookup[imageRow.mode] makes me think imageRow, in some cases, is None. This can happen if there are nulls in your dataframe. Try filtering your data to drop nulls in the image column before passing the data to the featureizer. You can also look at the path of any rows that contain null images to see why we were not able to read them.