mleap icon indicating copy to clipboard operation
mleap copied to clipboard

how to convert spark dataframe to mleap tensor[double] straightly

Open mullerhai opened this issue 3 years ago • 3 comments

Hi : I know how to convert mleap tensor to tensorflow tensor use our package, but I don't know how to make spark dataframe convert to mleap tensor[Double], I have found the method [TypeConverters.sparkToMleapValue()] ,but I don't know how to use,could you support me give a tutorial for this,thanks

mullerhai avatar Jun 29 '22 08:06 mullerhai

Use org.apache.spark.sql.mleap.TypeConverters

  def sparkToMleapConverter(dataset: DataFrame,
                            field: StructField): (types.StructField, (Any) => Any) = {
    (sparkFieldToMleapField(dataset, field), sparkToMleapValue(field.dataType))
  }

from spark dataframe can not get tensor[Double]

mullerhai avatar Jun 29 '22 09:06 mullerhai

As a caveat, converting from spark to mleap is kind of an unusual thing which we don't usually need to do. If you have a spark session and dataframe, then just do things with spark. mleap runtime is more for when you don't have a spark session (e.g., in a real time inference service).

That said, the sparkToMleapConverter is the way to do the conversion of a single field if you really need to. If you need to convert the entire dataframe, then toSparkLeapFrame is probably easier. Take a look at the toSparkLeapFrame code to see how to use the sparkToMleapConverter. You use that just by adding import ml.combust.mleap.spark.SparkSupport._.

Looking at sparkFieldToMleapField code you will need to have a spark VectorUDT, MatrixUDT, or an Array[VectorUDT] in order for it to be converted to an mleap tensor.

jsleight avatar Jul 01 '22 15:07 jsleight

As a caveat, converting from spark to mleap is kind of an unusual thing which we don't usually need to do. If you have a spark session and dataframe, then just do things with spark. mleap runtime is more for when you don't have a spark session (e.g., in a real time inference service).

That said, the sparkToMleapConverter is the way to do the conversion of a single field if you really need to. If you need to convert the entire dataframe, then toSparkLeapFrame is probably easier. Take a look at the toSparkLeapFrame code to see how to use the sparkToMleapConverter. You use that just by adding import ml.combust.mleap.spark.SparkSupport._.

Looking at sparkFieldToMleapField code you will need to have a spark VectorUDT, MatrixUDT, or an Array[VectorUDT] in order for it to be converted to an mleap tensor.

Ok ,thank ,Now use our package ,I make from spark dataframe normally generate tensorflow-java tensor & NdArray !

mullerhai avatar Jul 02 '22 04:07 mullerhai