TransmogrifAI Dataframe Encoders for TransmogrifAI types

Dataframe Encoders for TransmogrifAI types

Open tovbinm opened this issue 7 years ago • 4 comments

Problem Currently TransmogrifAI implements a bunch of custom functions to encode/decode TransmogrifAI type to/from Spark dataframe native types (see FeatureSparkTypes, FeatureTypeSparkConverter and FeatureTypeFactory). This method requires applying converters each time values are encoded/decoded to/from a Spark dataframe.

Solution We need to have a proper implementation of org.apache.spark.sql.Encoder to handle TransmogrifAI types efficiently.

Alternatives N/A

Additional context Ideally we should also avoid boxing/unboxing into TransmogrifAI but this would require a major refactoring. This is up for a discusion.

Aug 17 '18 05:08 tovbinm

spark dataframe is dataset of Row, Encoder or Decode for row already exists ,you just need not to define new Encoder or Decoder for Row.

Sep 06 '18 03:09 liuzhenhai93

@liuzhenhai93 yeah, Dataframe encoding is currently working. We would like to have a support for the following:

implicit val enc: Encoder[(Real, Text)] = ???
val reals: Dataset[(Real, Text)] = spark.createDataset(Seq(1.0.toReal -> "one".toText))

Oct 26 '18 16:10 tovbinm

@tovbinm you can try like this in scala import spark.implicits._ case class Wrap[T](unwrap: T) Then whenever you want to use custom type use them inside Wrap like this: val dataFrame = spark.createDataset(Seq(Wrap(2.0,"hello")))

Mar 05 '19 11:03 gsoni22

I don’t believe this would work (I will check it). Ideally I would like to avoid allocating another wrapper class, since we already do so (FeatureType is a wrapper around Option, Seq, Map etc).

Mar 06 '19 15:03 tovbinm

TransmogrifAI TransmogrifAI copied to clipboard

Dataframe Encoders for TransmogrifAI types

TransmogrifAI
TransmogrifAI copied to clipboard