BLINK
BLINK copied to clipboard
Reproduce the recall result on Zero-shot EL dataset
Hi, I use the code and Hyper-parameters you released on github to train bert-base-uncased on the Zero-shot EL dataset, but I can't get the result you showed on paper, I want to know how should I adjust the Hyper-parameters? The following are Hyper-parameters I used to train: learning_rate 1e-05 num_train_epochs 5 max_context_length 128 max_cand_length 128 train_batch_size 128 eval_batch_size 64 bert_model bert-base-uncased type_optimization all_encoder_layers
The Recall@64 on Zero-shot Train/Valideation/Test are 93.12, 91.44, 82.06
@xiepuzhao Thanks for the issue, we'll take a look. What are the numbers you're getting?
@ledw Thank you very much for your reply! The Recall@64 on Zero-shot Train/Validation/Test are 93.12, 91.44, 82.06 in your paper, but I got 0.9614, 0.8742, 0.7781.
I met the same problem with @xiepuzhao. Is this because the step of knowledge distillation is not included in train_biencoder.py?
@ledw @horseee Do you get any solutions?
@leezythu In my previous experiment, I set gradient_accumulation_steps to 8. However, batch size is a particularly important hyper-parameter for this experiment, and if the gradient accumulation is used, then the batch_size should be multiplied by this.
Here is my experimental results: train_batch_size: 128, gradient_accumulation_steps:8, test recall@64: 0.772 train_batch_size: 128, gradient_accumulation_steps:4, test recall@64: 0.789 train_batch_size: 128, gradient_accumulation_steps:1, test recall@64: 0.803 train_batch_size: 192, gradient_accumulation_steps:1, test recall@64: 0.809
80.3 is somewhat lower than 82.06, but it is reasonable for that I didn't incorporate Knowledge Distillation and Negative Sampling in.
Command for Training:
python blink/biencoder/train_biencoder.py --data_path data/zeshel/blink_format --output_path models/ --learning_rate 1e-05 --num_train_epochs 5 --max_context_length 128 --max_cand_length 128 --train_batch_size 128 --eval_batch_size 32 --bert_model bert-base --type_optimization all_encoder_layers --data_parallel
@horseee Thank you for your reply! I will try that.
My result is 0.9574 0.8939 0.8035
The Recall@64 on Zero-shot Train/Validation/Test
@namnam3000 You can follow the instructions here
I find a small bug in "load_entity_dict_zeshel" function. The doc_list.append(text[:256])
only considers a small length of entity's description. So it should be replaced by doc_list.append(" ".join(text.split()[:128]))
, which can raise the recall.