Different Recall&Accuracy Results
Hi,
I ran your code successfully and some results are different, especially for batch_size: 48. I used your parameters, but accuracy and recall values' differences are overmuch. I wonder why, maybe you changed other parameters. Especially for batch size: 48, why accuracies and recalls values are quite different according to your results.
Also biobert model didn't run, I got an error
Have a nice day.
===PARAMETERS=== debug False debug_data_num 200 dataset bc5cdr dataset_dir ./dataset/ serialization_dir ./serialization_dir/ preprocessed_doc_dir ./preprocessed_doc_dir/ kb_dir ./mesh/ cached_instance False lr 1e-05 weight_decay 0 beta1 0.9 beta2 0.999 epsilon 1e-08 amsgrad False word_embedding_dropout 0.1 cuda_devices 0 scoring_function_for_model indexflatip num_epochs 10 patience 10 batch_size_for_train 48 batch_size_for_eval 48 or 16, I tried this bert_name bert-base-uncased max_context_len 50 max_mention_len 12 max_canonical_len 12 max_def_len 36 model_for_training biencoder I tried other models candidates_dataset ./candidates.pkl max_candidates_num 10 search_method_for_faiss indexflatip how_many_top_hits_preserved 50 ===PARAMETERS END===

BioBERT error:

I'm sorry for the delay in replying, and thank you for the detailed comparison experiment. I don't have time to adjust the parameters for this code. I'll let you know what I noticed about your comment.
-
Batch size during training is important, because in-batch training, as used in the BLINK model and Gillick et al's model, batch size matters a lot.
-
The reason why the biobert model does not load is not immediately clear: if you have a newer version of transformers or allennlp, it is likely that it will not load with this code.