deeptype
deeptype copied to clipboard
Replicating model training and evaluation
Hi,
I'm having some problems replicating the metrics, to summarize the whole issue
Training When i trained the system using the 108 type axes in type_classifier.py, the training saturates at around 83% F1 score saying
No improvements for 40 epochs. Stopping ...
and stops,
could you please let us know the parameters you used for training and what is the final F1 score of your model which gave you a disambiguation accuracy close to 99% using CONLL and how many epochs did you train for ? I used the parameters that are listed in the tutorials
--cudnn --fused --hidden_sizes 200 200 --batch_size 256 --max_epochs 10000 --name TypeClassifier --weight_noise 1e-6 --save_dir my_great_model --anneal_rate 0.9999
Evaluation
i used the following evaluation method
for each mention (in each sentence) in CONLL dataset,
i get the predicted wiki entity first from the LSTM model (trained till 83% F1) and
if it matches exactly with the wiki entity in the CONLL for that mention (ground truth),
i consider that as a correct prediction, basically matching wiki qids
and got accuracy close to 75%, am i doing something wrong here? please let me know and few training tips would be great too :) sorry for bombarding with questions, hope this would be helpful to other folks who are eagerly waiting to contribute to this work :) thanks!
Hi
Did you figure out why? I'm running into a similar issue with not getting the expected performance