featran
featran copied to clipboard
A Scala feature transformation library for data science and machine learning
Updates [org.apache.flink:flink-scala](https://github.com/apache/flink) from 1.14.4 to 1.15.0. [GitHub Release Notes](https://github.com/apache/flink/releases/tag/release-1.15.0) - [Version Diff](https://github.com/apache/flink/compare/release-1.14.4...release-1.15.0) I'll automatically update this PR to resolve conflicts as long as you don't change it yourself. If you'd...
Hi, @regadas , @richwhitjr , I'd like to report a vulnerable dependency in **com.spotify:featran-spark_2.12:0.8.0-RC2**. ### Issue Description I noticed that **com.spotify:featran-spark_2.12:0.8.0-RC2** directly depends on **org.apache.spark:spark-core_2.12:3.1.1** in the [pom](https://repo1.maven.org/maven2/com/spotify/featran-spark_2.12/0.8.0-RC2/featran-spark_2.12-0.8.0-RC2.pom). However, as...
Updates [org.scala-sbt:sbt](https://github.com/sbt/sbt) from 1.6.1 to 1.6.2. [GitHub Release Notes](https://github.com/sbt/sbt/releases/tag/v1.6.2) - [Version Diff](https://github.com/sbt/sbt/compare/v1.6.1...v1.6.2) I'll automatically update this PR to resolve conflicts as long as you don't change it yourself. If you'd...
Updates [org.scalanlp:breeze](https://github.com/scalanlp/breeze) from 1.3 to 2.0. I'll automatically update this PR to resolve conflicts as long as you don't change it yourself. If you'd like to skip this version, you...
Had a play with solving this one: https://github.com/spotify/featran/issues/51 CL: - Implements an equivalent functional output to: https://github.com/tensorflow/transform/blob/master/tensorflow_transform/mappers.py#L1209 using the Transformer API on `Array[Int]` . Is efficient because it iterates over...
Useful for quick debug when using IDE's like `Intellij`.
Could be useful to have a transformer that allows to apply a global heavy hitters to seq like attribute. Something like `NNHeavyHitters extends Transformer[Seq[A], SketchMap[String, Long], Map[String, (Int, Long)]]`
It would be nice to be able to have a function on FeatureSpec that allows for a prefix on each Transformer name. ```scala FeatureSpec.of[T].withNamedPrefix("my_prefix").required(...) ``` The main reason is that...
Right now must data pipeline developers test code by faking input and expected output which may not be feasible for pipelines using featran. Also featran pipelines may have multiple outputs,...