spotify-tensorflow icon indicating copy to clipboard operation
spotify-tensorflow copied to clipboard

Handle sparse data

Open ravwojdyla opened this issue 7 years ago • 1 comments

There is a production case like this:

  case class TrainingExample(indices: List[Int],
                             data: List[Float],
                             label: Float,
                             weight: Float)

  object TestFeatureSpec {
    val featuresType: TensorFlowType[TrainingExample] = TensorFlowType[TrainingExample]
  }
...

  def convertToTrainingExample(sv: Seq[SparseVector[Float]]): TrainingExample = {
    val labelData = sv(0).data
    val label = labelData.head
    val weight = labelData.length match {
      case a if a == 2 => labelData(1)
      case _ => defaultWeight
    }
    TrainingExample(
      sv(1).index.toList,
      sv(1).data.toList,
      label,
      weight
    )
  }

...

    val features = extracted
      .featureValues[SparseVector[Float]]
      .map(sv => (sampler.getPartition(), convertToTrainingExample(sv)))
      .map { case (partition, example) =>
        (partition, TestFeatureSpec.featuresType.toExample(example))
      }
...

I guess there might be a problem with lists (indices, data), but can we handle this?

ravwojdyla avatar Jan 25 '18 20:01 ravwojdyla

@ravwojdyla I guess if we handle code to do Sparse -> Example, then we also need to provide code to do Example -> Sparse - probably both in Scala and Python.

yonromai avatar Jan 26 '18 22:01 yonromai