spark-tsne
spark-tsne copied to clipboard
t-SNE package does not seem to work with Spark 2.1
Hi,
Looks like the t-SNE package does not work with Spark 2.1. After importing the com.github.saurfang.* package, a simple method to compute mean etc fails:
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.stat.{MultivariateStatisticalSummary, Statistics}
val observations = sc.parallelize(
Seq(
Vectors.dense(1.0, 10.0, 100.0),
Vectors.dense(2.0, 20.0, 200.0),
Vectors.dense(3.0, 30.0, 300.0)
)
)
// Compute column summary statistics.
val summary: MultivariateStatisticalSummary = Statistics.colStats(observations)
println(summary.mean) // a dense vector containing the mean value for each column
println(summary.variance) // column-wise variance
println(summary.numNonzeros) // number of nonzeros in each column
and that fails with:
Name: Compile Error
Message: <console>:50: error: type mismatch;
found : org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector]
required: org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector]
val summary: MultivariateStatisticalSummary = Statistics.colStats(observations)
^
StackTrace:
Any thoughts on resolving this?
Thanks, Rajesh
I've looked into this, and am currently working on a patch.
Thanks for replying Erwin. That is great! please let me know how it goes, glad to test it out.
Regards, Rajesh
I've created a new PR, which has a green build. You can check it out for yourself:
https://github.com/erwinvaneijk/spark-tsne.git branch upgrade_to_2.1
If/when it gets merged to the main repos, I have no idea.
Kind regards, EJ
On Fri, Jun 2, 2017 at 11:26 PM Rajesh Kartha [email protected] wrote:
Thanks for replying Erwin. That is great! please let me know how it goes, glad to test it out.
Regards, Rajesh
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/saurfang/spark-tsne/issues/7#issuecomment-305913152, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOY20oD82MuJzX1TDxCwRLfiC35aSWDks5sAH4KgaJpZM4NsAwb .
Thanks EJ, will try your patch out and let you know.
-Rajesh
Wonder by the X2Helper.scala resides in the org/apache/spark/mllib package, is that really the need - any thoughts?
I am not yet sure if that could be causing some issues in our environment. The MNIST example in the code seems to throw the same old error.
Hi Rajesh - no idea, but it shouldn't give you the message. Your exact code is in the new test, so there's probably something else wrong in your code or build. I'll take a look if you want?
Thanks Erwin.
I was trying to run it in the cloud env and gave up. Now I am trying it on a regular Hadoop+Spark 2.0 cluster with spark-shell and I while running the MNIST example, I am getting:
java.lang.NoSuchMethodError: breeze.linalg.DenseMatrix$.ones$mDc$sp(IILscala/reflect/ClassTag;Lbreeze/storage/Zero;Lbreeze/math/Semiring;)Lbreeze/linalg/DenseMatrix;
at com.github.saurfang.spark.tsne.impl.BHTSNE$.tsne(BHTSNE.scala:38)
... 92 elided
The breeze jars that I have in my Spark2.0 instance are: breeze_2.11-0.11.2.jar breeze-macros_2.11-0.11.2.jar
Wonder if there is any specific jar that I need to pick up for this.