deeptype Replicating model training and evaluation

Replicating model training and evaluation

Open karimvision opened this issue 6 years ago • 1 comments

Hi,

I'm having some problems replicating the metrics, to summarize the whole issue

Training When i trained the system using the 108 type axes in type_classifier.py, the training saturates at around 83% F1 score saying

No improvements for 40 epochs. Stopping ...

and stops,

could you please let us know the parameters you used for training and what is the final F1 score of your model which gave you a disambiguation accuracy close to 99% using CONLL and how many epochs did you train for ? I used the parameters that are listed in the tutorials

--cudnn --fused --hidden_sizes 200 200 --batch_size 256 --max_epochs 10000  --name TypeClassifier --weight_noise 1e-6  --save_dir my_great_model  --anneal_rate 0.9999

Evaluation

i used the following evaluation method

for each mention (in each sentence) in CONLL dataset,
 i get the predicted wiki entity first from the LSTM model (trained till 83% F1) and 
if it matches exactly with the wiki entity in the CONLL for that mention (ground truth),
 i consider that as a correct prediction, basically matching wiki qids

and got accuracy close to 75%, am i doing something wrong here? please let me know and few training tips would be great too :) sorry for bombarding with questions, hope this would be helpful to other folks who are eagerly waiting to contribute to this work :) thanks!

Apr 04 '18 19:04 karimvision

Did you figure out why? I'm running into a similar issue with not getting the expected performance

Apr 13 '20 13:04 lbozarth

deeptype deeptype copied to clipboard

Replicating model training and evaluation

deeptype
deeptype copied to clipboard