spotify-tensorflow
spotify-tensorflow copied to clipboard
Handle sparse data
There is a production case like this:
case class TrainingExample(indices: List[Int],
data: List[Float],
label: Float,
weight: Float)
object TestFeatureSpec {
val featuresType: TensorFlowType[TrainingExample] = TensorFlowType[TrainingExample]
}
...
def convertToTrainingExample(sv: Seq[SparseVector[Float]]): TrainingExample = {
val labelData = sv(0).data
val label = labelData.head
val weight = labelData.length match {
case a if a == 2 => labelData(1)
case _ => defaultWeight
}
TrainingExample(
sv(1).index.toList,
sv(1).data.toList,
label,
weight
)
}
...
val features = extracted
.featureValues[SparseVector[Float]]
.map(sv => (sampler.getPartition(), convertToTrainingExample(sv)))
.map { case (partition, example) =>
(partition, TestFeatureSpec.featuresType.toExample(example))
}
...
I guess there might be a problem with lists (indices, data), but can we handle this?
@ravwojdyla I guess if we handle code to do Sparse -> Example, then we also need to provide code to do Example -> Sparse - probably both in Scala and Python.