Consume TensorFlowInferSchema from Java Spark 2.4.0
This is not an issue and is more of a request for guidance. I'd like to just use the TensorFlowInferSchema functionality in my Java spark job. Below is the sample snippet
JavaRDD<Example> exampleRdd =
jsc.parallelize(
Arrays.asList(
Example.newBuilder()
.setFeatures(
Features.newBuilder()
.putFeature(
"Test",
Feature.newBuilder()
.setInt64List(Int64List.newBuilder().addValue(10).build())
.build())
.build())
.build()));
StructType schema = TensorFlowInferSchema.apply(rdd.rdd(), ???);
However, the apply method expects a second argument - implicit evidence$1 : scala.reflect.runtime.universe.TypeTag[T].
Below is the compiled Java class
def apply[T](rdd : org.apache.spark.rdd.RDD[T])(implicit evidence$1 : scala.reflect.runtime.universe.TypeTag[T]) : org.apache.spark.sql.types.StructType = { /* compiled code */ }
I'm not familiar with Scala and it appears that it's not possible to create TypeTag[Example] in java.
Appreciate if you could share your thoughts on the below
- Is it safe (and possible) to just consume
TensorFlowInferSchemain Java without going viaspark.read.format("tfrecord").option("recordType", "Example")? - What to pass for the
TypeTag[T]argument? - Is there a Java example for this use case
Thanks for maintaining this project and highly appreciate your help!
"TensorFlowInferSchema" only infers the schema. Is that what you need?
I am not familiar with Spark Java API, but can you call scala function from Java?
If you want to read/write TFRecord with Java API, I assume something like this will work.
Dataset<Row> usersDF = spark.read().format("tfrecord").load("examples/src/main/resources/users.tfrecord"); usersDF.select("name", "favorite_color").write().format("tfrecord").save("namesAndFavColors.tfrecord");
Thanks for the example on loading - Yes, that should help to read tfrecord files into a DataFrame. However, as you point out, I just need automatic schema inference as we already build a JavaRDD<Example>. My understanding is Java classes can consume Scala classes and vice versa but I'm not familiar enough to understand if TensorFlowInferSchema can be invoked.
I was hoping to get some help there.
Thanks!