namaco
namaco copied to clipboard
Unable to reproduce results for CONLL2003 English dataset
Hey @icoxfog417 , @Hironsan
First of all, Thanks for putting a lot of efforts in creating such a modular NER codebase.
I tried training namaco
NER with CONLL 2003 english dataset but I'm not able to predict tagger.analyze
it for sample sentences.
Example:
# Predicting on a sentence
tagger = namaco.Tagger('models/model.h5', preprocessor=p, tokenizer=str.split)
sent = 'President Obama is speaking at the White House.'
print(tagger.analyze(sent))
An error is as follows:
Traceback (most recent call last):
File "predict_conll.py", line 38, in <module>
print(tagger.analyze(sent))
File "/home/ubuntu/namaco/namaco/tagger.py", line 65, in analyze
pred = self.predict(words)
File "/home/ubuntu/namaco/namaco/tagger.py", line 23, in predict
pred = self.model.predict([X[0], length])
File "/home/ubuntu/anaconda2/envs/lstm_namaco/lib/python3.6/site-packages/keras/engine/training.py", line 1695, in predict
check_batch_axis=False)
File "/home/ubuntu/anaconda2/envs/lstm_namaco/lib/python3.6/site-packages/keras/engine/training.py", line 82, in _standardize_input_data
'...')
ValueError: Error when checking model : the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 2 arrays: [array([[1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=int32), array([9])]...
Another thing to note is that I tried using basic model of japanese which you've provided in /data/models/ja
which are model.h5
and preprocessor.pkl
. They work fine.
But I am unable to do the same for english dataset. Could you please help me to resolve this issue?
Thanks, Nipun