Error with parser.train() with "type" of model
For a good while, I've been experiencing an intermittent error when calling parser.train. The error is thrown up from the Wapiti level and look like this:
<..> /wapiti-ruby/lib/wapiti/options.rb:154:in `validate!': unknown type: crfxpr (ArgumentError)
Þ¸Qbød; unknown algorithm: l-bfgseg}¸Qbø
The strange thing is, exactly the same code will sometimes work fine. I've so far been able to isolate down to the fact that the following code will return different strings on different runs:
require 'bundler'
Bundler.setup # -> hopefully a clean Wapiti 1.0.2
require 'anystyle'
AnyStyle::Dictionary.defaults[:adapter] = :hash # mingw-ruby gdbm is broken
puts AnyStyle.parser.model.options.type
Sometimes it returns "crf" (as expected, I think), sometimes "crf#####" where #### is a random bunch of chars.
I'm also getting consistent segfaults with parser.check("foo.xml"), unless the result is 100% correct. It appears to be rising from native_label in wapiti/lib/model.rb.
This all on Windows MingW, do you see anything at all similar?
Thanks, I'll take a look at this!
I haven't seen the error myself (I can reproduce segfaults when running check on a dataset containing labels which are not present in the model, but that's definitely unrelated to this).
Is gdbm definitely broken on Windows? Since it's part of the standard library, I was hoping this would be easy to install across platforms.
Thanks for the checking, confirmation.
GDBM is fine on Windows e.g. with the "standard" ruby installer. It's just the builds that come from mingw have a broken package, and I haven't got round to working out why.