featran icon indicating copy to clipboard operation
featran copied to clipboard

Add Segment Indices transformer

Open aaronquantexa opened this issue 3 years ago • 1 comments

Had a play with solving this one: https://github.com/spotify/featran/issues/51

CL:

  • Implements an equivalent functional output to: https://github.com/tensorflow/transform/blob/master/tensorflow_transform/mappers.py#L1209 using the Transformer API on Array[Int] . Is efficient because it iterates over the input Array once (O(n)) using a fold.
  • Add support for Int data types with Feature Builder specifically for Segment Indices
  • Add example usage of new Transformer in Examples in plain Scala

Testing-wise, I've verified the results on a few arrays manually by running the Example as well as adding a property test which checks the result on 100 randomly generated (and valid) arrays. It does not 100% follow the pattern of other property tests in the repo which check for feature column names etc.

Welcome any suggestions, especially from someone with knowledge on the repo, as I think the core algorithm is solid but unsure what more of the surrounding code needs work.

aaronquantexa avatar Jun 20 '21 21:06 aaronquantexa

In the file: SegmentIndicesSpec.scala, I had a functional-looking version of implementing the Arbitrary randomIncreasingArray which used a fold but it was less clear than the mutable alternative, which suits the dynamic style of the logic, so went with that implementation.

aaronquantexa avatar Jun 20 '21 21:06 aaronquantexa