scalda icon indicating copy to clipboard operation
scalda copied to clipboard

Unable to do topicProportions on loaded LDA model.

Open r-jenish opened this issue 8 years ago • 1 comments

Saving the model as:

val lda = LocalOnlineLda(
    OnlineLdaParams(
        vocabulary = lines(vocabFile).toIndexedSeq,
        alpha = 1.0/numTopics,
        eta = 1.0/numTopics,
        decay = 128,
        learningRate = 0.7,
        maxIter = 1000,
        convergenceThreshold = 0.001,
        numTopics = numTopics,
        totalDocs = numDocs,
        perplexity = true
    )
)
val model = lda.inference(new TextFileIterator(corpusDir,mbSize))
lda.saveModel(model,new File ("/home/xyz/tmp/lda_model"))

and loading it as:

val lda = LocalOnlineLda.empty
val model = lda.loadModel(new File("/home/xyz/tmp/lda_model")).get
val docloc = new File("/home/xyz/tmp/test_dataset/33629")
val testdoc = text(docloc)
val topicprops = lda.topicProportions(testdoc, model, Some(com.nitro.scalda.tokenizer.StanfordLemmatizer()))

Gives Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Dimension mismatch! error in the line val topicprops = lda.topicProportions(....).

Error Log:

Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Dimension mismatch!
    at scala.Predef$.require(Predef.scala:224)
    at breeze.linalg.operators.DenseMatrixMultiplyStuff$implOpMulMatrix_DMD_DMD_eq_DMD$.apply(DenseMatrixOps.scala:53)
    at breeze.linalg.operators.DenseMatrixMultiplyStuff$implOpMulMatrix_DMD_DMD_eq_DMD$.apply(DenseMatrixOps.scala:48)
    at breeze.linalg.ImmutableNumericOps$class.$times(NumericOps.scala:135)
    at breeze.linalg.DenseMatrix.$times(DenseMatrix.scala:53)
    at com.nitro.scalda.models.onlineLDA.local.LocalOnlineLda$$anonfun$eStep$2.apply(LocalOnlineLDA.scala:83)
    at com.nitro.scalda.models.onlineLDA.local.LocalOnlineLda$$anonfun$eStep$2.apply(LocalOnlineLDA.scala:74)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
    at com.nitro.scalda.models.onlineLDA.local.LocalOnlineLda.eStep(LocalOnlineLDA.scala:74)
    at com.nitro.scalda.models.onlineLDA.local.LocalOnlineLda.topicProportions(LocalOnlineLDA.scala:270)
    at testmodel$.delayedEndpoint$testmodel$1(testmodel.scala:28)
    at testmodel$delayedInit$body.apply(testmodel.scala:7)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:76)
    at scala.App$$anonfun$main$1.apply(App.scala:76)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
    at scala.App$class.main(App.scala:76)
    at testmodel$.main(testmodel.scala:7)
    at testmodel.main(testmodel.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

Scala Version: 2.11.8 System: Ubuntu 14.04

r-jenish avatar May 30 '16 12:05 r-jenish

I'm experiencing the same issue.

It seems the problem occurs because the example uses the empty LocalOnLineLda.empty method to create the lda object.

https://github.com/Nitro/scalda/blob/74c585af40db6e4426a3903aa50f9dd743c0b2ec/src/main/scala/com/nitro/scalda/examples/TopicProportionsExample.scala#L25

When using empty, the numTopics is set to zero (0). Later, this value is used to create the initialGamma which returns an empty matrix (1 row, 0 columns).

https://github.com/Nitro/scalda/blob/74c585af40db6e4426a3903aa50f9dd743c0b2ec/src/main/scala/com/nitro/scalda/models/onlineLDA/local/LocalOnlineLDA.scala#L264-L268

For what I can tell this matrix is used in several operations and eventually fails as it is expected to have the same number of columns as the original model had, and since it was set to zero, it fails 😞.

A possible solution is to use the model.lambda.rows instead of the params.numTopics in the topicProportions method as this matches the original number of topics.

    // LocalOnlineLDA.topicProportions
   
    val initialGamma = new DenseMatrix[Double](
      1,
      model.lambda.rows,
      G(100.0, 1.0 / 100.0).sample(model.lambda.rows).toArray
    )

Maybe the Nitro team can tell us if this is the correct way to address the problem?

Thanks!

onema avatar Oct 03 '18 00:10 onema