Dataframe can not convert to tfrecord
val df1: DataFrame = spark.createDataFrame(rdd,subSchema)
val df2 = df1.withColumn("entity",struct("age","salary")).
groupBy("employee_name")
df1 can convert to tfrecord,but df2 cannot .
why is that?
df2 is the output of the groupBy operation, which is "RelationalGroupedDataset". There is no TFRecord equivalence for this kind of data schema. TFRecord supports very limited schema. It is for TF model training only.
May i ask how can i design my schema like this 。 I want convert RDD into DataFrame use StructType , inside of groupBy opration. root |-- name: string (nullable = true) |-- age: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- _1: string (nullable = true) | | |-- _2: string (nullable = true) | | |-- _3: integer (nullable = false)
+-------------+--------------------+ |employee_name|collect_list(entity)| +-------------+--------------------+ | Jen | [[53, 79000]]| | Michae l |[[56, 86000], [30...| | Kumar | [[34, 90000], [50...| | Maria | [[24, 90000]]| | Raman | [[40, 99000], [36...| | Jeff | [[25, 80000]]| +-------------+--------------------+
If you already have the RDD, then you can create the schema, then use createDataFrame as shown here.
https://stackoverflow.com/questions/29383578/how-to-convert-rdd-object-to-dataframe-in-spark