classifier-reborn icon indicating copy to clipboard operation
classifier-reborn copied to clipboard

LSI is broken af.

Open henrebotha opened this issue 9 years ago • 21 comments
trafficstars

I'm on Ruby 2.2.4. I'm trying to use LSI. Nothing works, and the error messages SUCK. I've tried both the last release (i.e. the gem version) and the latest commit from Github.

lsi = ClassifierReborn::LSI.new
training_data = ["Bcom", "Corporate Administration", "Forensic Auditing"]
category = :accounting
training_data.each do |d|
  begin
    lsi.add_item(d, category)
  rescue StandardError => e
    puts "#{d} misbehaving: #{e.message}"
  end
end

#=> Forensic Auditing misbehaving: comparison of Float with NaN failed

Better yet, if I swap the order of the training data, I get this:

lsi = ClassifierReborn::LSI.new
training_data = ["Corporate Administration", "Forensic Auditing", "Bcom"]
category = :accounting
training_data.each do |d|
  begin
    lsi.add_item(d, category)
  rescue StandardError => e
    puts "#{d} misbehaving: #{e.message}"
  end
end

#=> Forensic Auditing misbehaving: comparison of Float with NaN failed
#=> Bcom misbehaving: comparison of Float with NaN failed

henrebotha avatar May 20 '16 14:05 henrebotha

There are some known issues with LSI. Are you using GNU GSL or the native Ruby version? If you're using the native ruby version, it relies on a buggy Ruby implementation of a matrix transform (discussed here #30) and throws this type of error for some input. If that's the case, using GNU GSL will fix this. If you're using GNU GSL, this will require some digging.

Ch4s3 avatar May 20 '16 14:05 Ch4s3

You're fast! I am using the native Ruby version. I'll hit up GNU GSL and see what happens.

If I were you I'd mention this in the Readme.

henrebotha avatar May 20 '16 14:05 henrebotha

I happened to be on the issues. Yeah, let me know how GNU GSL works out. I need to rewrite the SVD, but I'm not a great C programmer so the process has been slow to say the least. If you're trying to train with small inputs especially ones that use abbreviations, the matrix transform is highly likely to break in the Ruby only version.

Ch4s3 avatar May 20 '16 14:05 Ch4s3

While I have you, I'm getting this:

GSL::ERROR::EUNIMPL: Ruby/GSL error code 24, svd of MxN matrix, M<N, is not implemented (file svd.c, line 61), the requested feature is not (yet) implemented
from /Users/leaply/.rbenv/versions/2.2.4/lib/ruby/gems/2.2.0/bundler/gems/classifier-reborn-4e3bb14d6388/lib/classifier-reborn/lsi.rb:292:in `SV_decomp'

henrebotha avatar May 20 '16 14:05 henrebotha

Hum, could be related to this https://github.com/SciRuby/rb-gsl/issues/21. I'm investigating.

Ch4s3 avatar May 20 '16 14:05 Ch4s3

Which version of GSL did you pull down?

Ch4s3 avatar May 20 '16 14:05 Ch4s3

1.16 via homebrew

henrebotha avatar May 20 '16 14:05 henrebotha

1.16 might work, let me try to pull down fresh versions later and try locally.

Ch4s3 avatar May 20 '16 15:05 Ch4s3

I haven't gotten anywhere with this, can anyone else reproduce this?

Ch4s3 avatar May 28 '16 23:05 Ch4s3

@henrebotha can you try with the latest master to see if #77 raises an error on your input?

Ch4s3 avatar Nov 29 '16 17:11 Ch4s3

That's gonna take some doing. I'll try when I have access to a Mac.

henrebotha avatar Nov 29 '16 17:11 henrebotha

@henrebotha have you tried this yet?

Ch4s3 avatar Dec 30 '16 22:12 Ch4s3

I intend to close this if there's no more action in the next few days.

Ch4s3 avatar Jan 06 '17 20:01 Ch4s3

@Ch4s3 @henrebotha I'm seeing the same issue with my data and can reproduce with this script:

require 'classifier-reborn'

lsi = ClassifierReborn::LSI.new

# Without gsl this raises NoMethodError
# /classifier-reborn-2.0.4/lib/classifier-reborn/lsi.rb:143:
# in `block in build_index': undefined method `normalize' for nil:NilClass

# With gsl this raises GSL::ERROR::EUNIMPL
# /classifier-reborn-2.0.4/lib/classifier-reborn/lsi.rb:292:in `SV_decomp':
# Ruby/GSL error code 24, svd of MxN matrix, M<N, is not implemented (file svd.c, line 60),
# the requested feature is not (yet) implemented

lsi.add_item 'England', 'xx'
lsi.add_item 'England & Wales', 'xx'
lsi.add_item 'England And Wales', 'xx'

Using GNU GSL, tried upgrading from 2.2.1 to 2.3 and that didn't fix it.

Related to this TODO in lsi.rb?

timcraft avatar Jan 19 '17 12:01 timcraft

Any ideas on this? I'm seeing the Ruby/GSL-derived exception in SV_decomp whenever I try to build an index on more than around 2,000 sentences. I have 4,007 sentences I'd like to index. For those 2000 the classifier works great for my purpose, so I'm really eager to find a way to get this working properly, if possible...

(to be fair, it probably has nothing to do with how many sentences I have and more to do with some sentence entering the index beyond 2000 that is causing a problem like seen in other comments above...)

mepatterson avatar Mar 09 '17 20:03 mepatterson

@mepatterson I'd guess you have some malformed input. Can you throw a begin rescue around your training and see which doc/line blows it up?

@timcraft I know this sounds stupid, but have you double checked that you're actually using GNU GSL? It may not have loaded correctly.

Ch4s3 avatar Mar 10 '17 19:03 Ch4s3

Actually I can confirm @timcraft repro also. Just those three add item lines will cause the gsl crash every time on my machine using very latest gsl and rb-gsl

On Fri, Mar 10, 2017 at 1:23 PM Chase Gilliam [email protected] wrote:

@mepatterson https://github.com/mepatterson I'd guess you have some malformed input. Can you throw a begin rescue around your training and see which doc/line blows it up?

@timcraft https://github.com/timcraft I know this sounds stupid, but have you double checked that you're actually using GNU GSL? It may not have loaded correctly.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jekyll/classifier-reborn/issues/69#issuecomment-285760979, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEh33B7cKesfX1ZYXHOqJb4g4Adjomfks5rkaNFgaJpZM4IjPG5 .

mepatterson avatar Mar 10 '17 19:03 mepatterson

Ok, I'll try to dig in this weekend.

Ch4s3 avatar Mar 10 '17 19:03 Ch4s3

@Ch4s3 Yep, it appears to be loaded ok. I added this at the top of the script (matrix code from gsl-2.1.0.2/examples/linalg/SV.rb which uses SV_decomp):

puts "Using GSL/#{GSL::VERSION} RubyGSL/#{GSL::RUBY_GSL_VERSION}"
a = GSL::Matrix[[3, 5, 2], [6, 2, 1], [4, 7, 3]]
u, v, s = a.SV_decomp
p u*GSL::Matrix.diagonal(s)*v.trans

Output is Using GSL/2.3 RubyGSL/2.1.0.2, and the correct matrix.

timcraft avatar Mar 11 '17 09:03 timcraft

Same here.

I have GSL installed but it's not even loaded

elisaado avatar Nov 28 '17 21:11 elisaado

@elisaado can you post any details?

Ch4s3 avatar Nov 28 '17 22:11 Ch4s3