featran
featran copied to clipboard
Add Segment Indices transformer
Had a play with solving this one: https://github.com/spotify/featran/issues/51
CL:
- Implements an equivalent functional output to: https://github.com/tensorflow/transform/blob/master/tensorflow_transform/mappers.py#L1209 using the Transformer API on
Array[Int]
. Is efficient because it iterates over the input Array once (O(n)
) using a fold. - Add support for
Int
data types with Feature Builder specifically for Segment Indices - Add example usage of new Transformer in Examples in plain Scala
Testing-wise, I've verified the results on a few arrays manually by running the Example as well as adding a property test which checks the result on 100 randomly generated (and valid) arrays. It does not 100% follow the pattern of other property tests in the repo which check for feature column names etc.
Welcome any suggestions, especially from someone with knowledge on the repo, as I think the core algorithm is solid but unsure what more of the surrounding code needs work.
In the file: SegmentIndicesSpec.scala, I had a functional-looking version of implementing the Arbitrary randomIncreasingArray
which used a fold but it was less clear than the mutable alternative, which suits the dynamic style of the logic, so went with that implementation.