ZombieWriter icon indicating copy to clipboard operation
ZombieWriter copied to clipboard

"comparison of Float with NaN failed"...and GSL is Installed

Open tra38 opened this issue 7 years ago • 2 comments

While trying to fix an unrelated issue, I experimented with the code from #5, but using ZombieWriter::MachineLearning rather than ZombieWriter::Randomization.


zombie = ZombieWriter::MachineLearning.new

zombie.add_string(content: "This is filler text that I invented.This is also a paragraph that could be used")
zombie.add_string(content: "This post is amazing. Please take a look")
zombie.add_string(content: "For all sports fan, you must watch this video. Hey you have to check this out.")

array = zombie.generate_articles

p array

#/Users/tariqali/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/kmeans-clusterer-0.11.4/lib/kmeans-clusterer.rb:237:in `sort_by': comparison of Float with NaN failed (ArgumentError)

The culprit is the third string. Classifier-Reborn classified its lsi_norm as a vector of NaNs...

 "For all sports fan, you must watch this video. Hey you have to check this out.\n"=>
  #<ClassifierReborn::ContentNode:0x007fdec4b25ae8
   @categories=[],
   @lsi_norm=GSL::Vector
[   nan   nan   nan   nan   nan   nan   nan ... ],
   @lsi_vector=GSL::Vector
[ 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 ... ],
   @raw_norm=GSL::Vector
[ 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 ... ],
   @raw_vector=GSL::Vector
[ 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 ... ],
   @word_hash={:for=>1, :sport=>1, :fan=>1, :must=>1, :watch=>1, :video=>1, :hei=>1, :check=>1, :out=>1}>}

Changing the third string slightly resolves the issue.

zombie = ZombieWriter::MachineLearning.new

zombie.add_string(content: "This is filler text that I invented.This is also a paragraph that could be used")
zombie.add_string(content: "This post is amazing. Please take a look")
zombie.add_string(content: "For all sports fan, you must watch this video. Hey you have to check this out. Filler, filler, filler.")

array = zombie.generate_articles

p array
 "For all sports fan, you must watch this video. Hey you have to check this out. Filler, filler, filler.\n"=>
  #<ClassifierReborn::ContentNode:0x007fd931432fd0
   @categories=[],
   @lsi_norm=GSL::Vector
[ 6.205e-01 1.432e-01 1.432e-01 1.432e-01 1.432e-01 1.432e-01 0.000e+00 ... ],
   @lsi_vector=GSL::Vector
[ 6.593e-01 1.522e-01 1.522e-01 1.522e-01 1.522e-01 1.522e-01 0.000e+00 ... ],
   @raw_norm=GSL::Vector
[ 5.547e-01 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 ... ],
   @raw_vector=GSL::Vector
[ 6.272e-01 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 ... ],
   @word_hash={:for=>1, :sport=>1, :fan=>1, :must=>1, :watch=>1, :video=>1, :hei=>1, :check=>1, :out=>1, :filler=>3}>}

But why? Both scenarios appeared to have a @word_hash, so it isn't quite clear why one string had a vector of NaNs and the other one doesn't. Is it because in the second scenario, the third string had words that were similar to that of the first string? I will have to research this issue more carefully and decide how to gracefully handle this potential error.

This problem is probably not likely to happen in the real-world...if you add long passages to ZombieWriter, there's bound to be a few overlaps of words that classifier-reborn can detect. But it could happen...which is why I need to figure out how to fix it.

tra38 avatar Sep 02 '17 23:09 tra38

same problem here. hope to see an answer

mahaina avatar Jul 06 '18 09:07 mahaina

Hi @mahaina. I'll see if I can work on this issue, probably in the next two weeks. If you have a sample corpus where this error can occur reliably, please send that over to me so that I can use it as 'test' material (though it's not necessary and I can work with the existing corpus within the OP). Right now though, I'm using those three sentences I mentioned in the OP, which allows me to reliably reproduce the error, but it's possible that your corpus might have some unique characteristics as well.

tra38 avatar Jul 15 '18 05:07 tra38