spark-tsne icon indicating copy to clipboard operation
spark-tsne copied to clipboard

t-SNE package does not seem to work with Spark 2.1

Open kartha01 opened this issue 8 years ago • 7 comments

Hi,

Looks like the t-SNE package does not work with Spark 2.1. After importing the com.github.saurfang.* package, a simple method to compute mean etc fails:

import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.stat.{MultivariateStatisticalSummary, Statistics}

val observations = sc.parallelize(
  Seq(
    Vectors.dense(1.0, 10.0, 100.0),
    Vectors.dense(2.0, 20.0, 200.0),
    Vectors.dense(3.0, 30.0, 300.0)
  )
)

// Compute column summary statistics.
val summary: MultivariateStatisticalSummary = Statistics.colStats(observations)
println(summary.mean)  // a dense vector containing the mean value for each column
println(summary.variance)  // column-wise variance
println(summary.numNonzeros)  // number of nonzeros in each column

and that fails with:

Name: Compile Error
Message: <console>:50: error: type mismatch;
 found   : org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector]
 required: org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector]
       val summary: MultivariateStatisticalSummary = Statistics.colStats(observations)
                                                                         ^
StackTrace: 

Any thoughts on resolving this?

Thanks, Rajesh

kartha01 avatar May 31 '17 18:05 kartha01

I've looked into this, and am currently working on a patch.

erwinvaneijk avatar Jun 02 '17 13:06 erwinvaneijk

Thanks for replying Erwin. That is great! please let me know how it goes, glad to test it out.

Regards, Rajesh

kartha01 avatar Jun 02 '17 21:06 kartha01

I've created a new PR, which has a green build. You can check it out for yourself:

https://github.com/erwinvaneijk/spark-tsne.git branch upgrade_to_2.1

If/when it gets merged to the main repos, I have no idea.

Kind regards, EJ

On Fri, Jun 2, 2017 at 11:26 PM Rajesh Kartha [email protected] wrote:

Thanks for replying Erwin. That is great! please let me know how it goes, glad to test it out.

Regards, Rajesh

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/saurfang/spark-tsne/issues/7#issuecomment-305913152, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOY20oD82MuJzX1TDxCwRLfiC35aSWDks5sAH4KgaJpZM4NsAwb .

erwinvaneijk avatar Jun 04 '17 09:06 erwinvaneijk

Thanks EJ, will try your patch out and let you know.

-Rajesh

kartha01 avatar Jun 08 '17 17:06 kartha01

Wonder by the X2Helper.scala resides in the org/apache/spark/mllib package, is that really the need - any thoughts?
I am not yet sure if that could be causing some issues in our environment. The MNIST example in the code seems to throw the same old error.

kartha01 avatar Jun 09 '17 16:06 kartha01

Hi Rajesh - no idea, but it shouldn't give you the message. Your exact code is in the new test, so there's probably something else wrong in your code or build. I'll take a look if you want?

erwinvaneijk avatar Jun 10 '17 11:06 erwinvaneijk

Thanks Erwin.

I was trying to run it in the cloud env and gave up. Now I am trying it on a regular Hadoop+Spark 2.0 cluster with spark-shell and I while running the MNIST example, I am getting:

java.lang.NoSuchMethodError: breeze.linalg.DenseMatrix$.ones$mDc$sp(IILscala/reflect/ClassTag;Lbreeze/storage/Zero;Lbreeze/math/Semiring;)Lbreeze/linalg/DenseMatrix;
  at com.github.saurfang.spark.tsne.impl.BHTSNE$.tsne(BHTSNE.scala:38)
  ... 92 elided

The breeze jars that I have in my Spark2.0 instance are: breeze_2.11-0.11.2.jar breeze-macros_2.11-0.11.2.jar

Wonder if there is any specific jar that I need to pick up for this.

kartha01 avatar Jun 13 '17 01:06 kartha01