scalda
scalda copied to clipboard
Unable to do topicProportions on loaded LDA model.
Saving the model as:
val lda = LocalOnlineLda(
OnlineLdaParams(
vocabulary = lines(vocabFile).toIndexedSeq,
alpha = 1.0/numTopics,
eta = 1.0/numTopics,
decay = 128,
learningRate = 0.7,
maxIter = 1000,
convergenceThreshold = 0.001,
numTopics = numTopics,
totalDocs = numDocs,
perplexity = true
)
)
val model = lda.inference(new TextFileIterator(corpusDir,mbSize))
lda.saveModel(model,new File ("/home/xyz/tmp/lda_model"))
and loading it as:
val lda = LocalOnlineLda.empty
val model = lda.loadModel(new File("/home/xyz/tmp/lda_model")).get
val docloc = new File("/home/xyz/tmp/test_dataset/33629")
val testdoc = text(docloc)
val topicprops = lda.topicProportions(testdoc, model, Some(com.nitro.scalda.tokenizer.StanfordLemmatizer()))
Gives Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Dimension mismatch!
error in the line val topicprops = lda.topicProportions(....)
.
Error Log:
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Dimension mismatch!
at scala.Predef$.require(Predef.scala:224)
at breeze.linalg.operators.DenseMatrixMultiplyStuff$implOpMulMatrix_DMD_DMD_eq_DMD$.apply(DenseMatrixOps.scala:53)
at breeze.linalg.operators.DenseMatrixMultiplyStuff$implOpMulMatrix_DMD_DMD_eq_DMD$.apply(DenseMatrixOps.scala:48)
at breeze.linalg.ImmutableNumericOps$class.$times(NumericOps.scala:135)
at breeze.linalg.DenseMatrix.$times(DenseMatrix.scala:53)
at com.nitro.scalda.models.onlineLDA.local.LocalOnlineLda$$anonfun$eStep$2.apply(LocalOnlineLDA.scala:83)
at com.nitro.scalda.models.onlineLDA.local.LocalOnlineLda$$anonfun$eStep$2.apply(LocalOnlineLDA.scala:74)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at com.nitro.scalda.models.onlineLDA.local.LocalOnlineLda.eStep(LocalOnlineLDA.scala:74)
at com.nitro.scalda.models.onlineLDA.local.LocalOnlineLda.topicProportions(LocalOnlineLDA.scala:270)
at testmodel$.delayedEndpoint$testmodel$1(testmodel.scala:28)
at testmodel$delayedInit$body.apply(testmodel.scala:7)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at testmodel$.main(testmodel.scala:7)
at testmodel.main(testmodel.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Scala Version: 2.11.8 System: Ubuntu 14.04
I'm experiencing the same issue.
It seems the problem occurs because the example uses the empty LocalOnLineLda.empty
method to create the lda
object.
https://github.com/Nitro/scalda/blob/74c585af40db6e4426a3903aa50f9dd743c0b2ec/src/main/scala/com/nitro/scalda/examples/TopicProportionsExample.scala#L25
When using empty, the numTopics
is set to zero (0
). Later, this value is used to create the initialGamma
which returns an empty matrix (1 row, 0 columns).
https://github.com/Nitro/scalda/blob/74c585af40db6e4426a3903aa50f9dd743c0b2ec/src/main/scala/com/nitro/scalda/models/onlineLDA/local/LocalOnlineLDA.scala#L264-L268
For what I can tell this matrix is used in several operations and eventually fails as it is expected to have the same number of columns as the original model had, and since it was set to zero, it fails 😞.
A possible solution is to use the model.lambda.rows
instead of the params.numTopics
in the topicProportions
method as this matches the original number of topics.
// LocalOnlineLDA.topicProportions
val initialGamma = new DenseMatrix[Double](
1,
model.lambda.rows,
G(100.0, 1.0 / 100.0).sample(model.lambda.rows).toArray
)
Maybe the Nitro team can tell us if this is the correct way to address the problem?
Thanks!