spark-timeseries
spark-timeseries copied to clipboard
added type conversion toDenseVector in TimeSeriesUtil.rebaseWithUniformSource
The original call vec(startLoc until endLoc)
on a Vector[T]
returns a SliceVector[Int, T]
(as shown below). SliceVector[Int, T]
does not inherit from Serializable
as per breeze docs.
Only relevant parts of REPL echos are kept below:
scala> import breeze.linalg._
scala> val dv: DenseVector[Double] = DenseVector.zeros[Double](5)
scala> val v: Vector[Double] = DenseVector.zeros[Double](5)
scala> dv(0 until 5)
res6: breeze.linalg.DenseVector[Double] = DenseVector(0.0, 0.0, 0.0, 0.0, 0.0)
scala> v(0 until 5)
res7: breeze.linalg.SliceVector[Int,Double] = breeze.linalg.SliceVector@7daa0333
scala> sc.parallelize(Array(dv(0 until 5))).collect() // works
scala> sc.parallelize(Array(v(0 until 5))).collect() // fails
15/08/14 11:59:09 ERROR TaskSetManager: Failed to serialize task 47, not attempting to retry it.
java.lang.reflect.InvocationTargetException
...
This means that if a user currently tries to use a function in the public API that calls this private helper, it results in an exception. For example, calling slice
, on a TimeSeriesRDD
with a uniform index, which calls this function. The cast to DenseVector
solves this issue.
(Additionally, this issue is masked by setting the Kryo serializer, since that seems to get around this issue... I'm not clear on why that is, but it seems reasonable to think that the user shouldn't have to do that to be able to use).