frameless
frameless copied to clipboard
Expressive types for Spark.
My understanding is that `Option` should be used to represent columns that one might mark nullable in vanilla Spark. I tried something along the lines of the following: ``` case...
Consider the following example: ```scala import frameless.functions.aggregate.{collectSet, max, min} import frameless.syntax._ import frameless.TypedDataset case class Foo(bar: Int) val ds = TypedDataset.create(List.empty[Foo]) ds .agg( min(ds('bar)), collectSet(ds('bar)) ) .collect .run ``` It...
We are currently missing these two Dataset method: - DataStreamWriter writeStream() - Dataset withWatermark(String eventTime, String delayThreshold) That require some understanding of Spark streaming to be properly typed and tested....
Would it make sense to be able to introduce support for `avro` schema for `TypedDataSet`? The current code defines schema based on the `SparkSQL` "language": https://github.com/typelevel/frameless/blob/576eb675dbd121453679a57ae7117e4fb53d9212/dataset/src/main/scala/frameless/TypedDatasetForwarded.scala#L43-L44 On the other hand...
It is possible to define a case class with reserve field names using back-ticks. ```scala case class Foo(a: String, `if`: Int) val t = TypedDataset.create(Seq(Foo("a",2), Foo("b",2))) ``` Fails with the...
Hi, I have recently started exploring frameless and trying to figure out joins. Especially left and right joins. Would it be possible to add additional examples in the documentation? It...
Vanilla Spark: ```scala val df: DataFrame = ??? val filtered = df.filter(df("value")
Hello, I am starting with Frameless and I am having a hard time converting my code based on spark-Dataframes to the Frameless framework. The blocking point I reach now is...
Meta-issue to list what has been done in frameless-ml and what remains to be done. Spark ML docs: https://spark.apache.org/docs/latest/ml-guide.html # Abstractions - [x] `TypedTransformer`, the type-safe equivalent of Spark ML...
Exhaustive status of the API implemented by `frameless.TypedColumn` compared to Spark's `Column`. It's split into two, the methods implemented directly on `Columns`, and the methods comings from `org.apache.spark.sql.functions._` ### Column...