namaco icon indicating copy to clipboard operation
namaco copied to clipboard

Unable to reproduce results for CONLL2003 English dataset

Open nipunsadvilkar opened this issue 6 years ago • 0 comments

Hey @icoxfog417 , @Hironsan

First of all, Thanks for putting a lot of efforts in creating such a modular NER codebase.

I tried training namaco NER with CONLL 2003 english dataset but I'm not able to predict tagger.analyze it for sample sentences.

Example:

# Predicting on a sentence
tagger = namaco.Tagger('models/model.h5', preprocessor=p, tokenizer=str.split)
sent = 'President Obama is speaking at the White House.'
print(tagger.analyze(sent))

An error is as follows:

Traceback (most recent call last):
  File "predict_conll.py", line 38, in <module>
    print(tagger.analyze(sent))
  File "/home/ubuntu/namaco/namaco/tagger.py", line 65, in analyze
    pred = self.predict(words)
  File "/home/ubuntu/namaco/namaco/tagger.py", line 23, in predict
    pred = self.model.predict([X[0], length])
  File "/home/ubuntu/anaconda2/envs/lstm_namaco/lib/python3.6/site-packages/keras/engine/training.py", line 1695, in predict
    check_batch_axis=False)
  File "/home/ubuntu/anaconda2/envs/lstm_namaco/lib/python3.6/site-packages/keras/engine/training.py", line 82, in _standardize_input_data
    '...')
ValueError: Error when checking model : the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 2 arrays: [array([[1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=int32), array([9])]...

Another thing to note is that I tried using basic model of japanese which you've provided in /data/models/ja which are model.h5 and preprocessor.pkl. They work fine.

But I am unable to do the same for english dataset. Could you please help me to resolve this issue?

Thanks, Nipun

nipunsadvilkar avatar Sep 03 '18 14:09 nipunsadvilkar