spark-tfrecord icon indicating copy to clipboard operation
spark-tfrecord copied to clipboard

Do we support spark dataframe straightly convert to tensorflow-java Tensor like TFloat32 TFloat64 or Operand[T <: TNumber]

Open mullerhai opened this issue 3 years ago • 1 comments

HI: spark-tfrecord is great project ,but now I only know how to use spark read or write tfrecord file with dataframe ,In pregress We also need dataframe straightly convert to tensorflow-java Tensor like TFloat32 TFloat64 or Operand[T <: TNumber] generate tensor data for tensorflow model input train data like spark org.apache.spark.ml.linalg.Vector .

mullerhai avatar Jun 10 '22 06:06 mullerhai

I don't understand your use case. Can you elaborate on how you plan to use Spark-TFRecord with Tensorflow-Java?

Spark-TFRecord is designed as a Spark data source, i.e. it handles data format conversion between TFRecord and Spark Dataframe, which happens during read/write operation. Once you read in the data, you can process it as a regular Dataframe. If I understand correctly, your request has nothing to do with TFRecord, you could read in a dataset in Avro, Parquet or CSV format, then you want to convert the Dataframe to tensorflow-java format in memory (instead of storing it in TFRecord format in disk)? This is out of scope for Spark-TFRecord.

junshi15 avatar Jun 10 '22 13:06 junshi15