spark-tfrecord icon indicating copy to clipboard operation
spark-tfrecord copied to clipboard

Error: java.lang.ClassCastException: com.linkedin.spark.shaded.org.tensorflow.example.FeatureList cannot be cast to com.linkedin.spark.shaded.org.tensorflow.example.Feature

Open nitinware opened this issue 3 years ago • 3 comments

I am trying to write a spark df to 'tfrecord' df.write.mode("overwrite").format("tfrecord").option("recordType", "tfrecords").save(outputPath + '/tf-records/') I am running on gcp dataproc cluster which comes with spark version '3.1.2' and I am using spark-tfrecord jar - 'spark-tfrecord_2.12-0.3.4.jar'

Seeing below error on write operation -

22/01/21 05:33:13 ERROR org.apache.spark.util.Utils: Aborting task
java.lang.IllegalArgumentException: Unsupported recordType tfrecords: recordType can be Example or SequenceExample
	at com.linkedin.spark.datasources.tfrecord.TFRecordOutputWriter.write(TFRecordOutputWriter.scala:33)
	at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:140)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:278)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:286)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:210)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)

Appreciate your inputs on this issue, Thanks.

nitinware avatar Jan 21 '22 05:01 nitinware

The error message is very clear. recordType can be Example or SequenceExample

Instead of .option("recordType", "tfrecords"), you should use .option("recordType", "Example")` or SequenceExample.

Please take a look at the README file. https://github.com/linkedin/spark-tfrecord#features

junshi15 avatar Jan 21 '22 05:01 junshi15

thanks for quick response seeing below error now, appreciate ur inputs, thanks -

java.lang.ClassCastException: com.linkedin.spark.shaded.org.tensorflow.example.FeatureList cannot be cast to com.linkedin.spark.shaded.org.tensorflow.example.Feature
	at com.linkedin.spark.datasources.tfrecord.TFRecordSerializer.$anonfun$serializeExample$1(TFRecordSerializer.scala:22)
	at com.linkedin.spark.datasources.tfrecord.TFRecordSerializer.$anonfun$serializeExample$1$adapted(TFRecordSerializer.scala:19)
	at scala.collection.immutable.Range.foreach(Range.scala:158)
	at com.linkedin.spark.datasources.tfrecord.TFRecordSerializer.serializeExample(TFRecordSerializer.scala:19)
	at com.linkedin.spark.datasources.tfrecord.TFRecordOutputWriter.write(TFRecordOutputWriter.scala:29)
	at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:140)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:278)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:286)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:210)

nitinware avatar Jan 21 '22 06:01 nitinware

I am guessing your data is "SequenceExample", but you try to write it as "Example".

junshi15 avatar Jan 21 '22 06:01 junshi15