BLINK icon indicating copy to clipboard operation
BLINK copied to clipboard

Reproduce the recall result on Zero-shot EL dataset

Open xiepuzhao opened this issue 4 years ago • 11 comments

Hi, I use the code and Hyper-parameters you released on github to train bert-base-uncased on the Zero-shot EL dataset, but I can't get the result you showed on paper, I want to know how should I adjust the Hyper-parameters? The following are Hyper-parameters I used to train: learning_rate 1e-05 num_train_epochs 5 max_context_length 128 max_cand_length 128 train_batch_size 128 eval_batch_size 64 bert_model bert-base-uncased type_optimization all_encoder_layers

xiepuzhao avatar Jan 26 '21 10:01 xiepuzhao

The Recall@64 on Zero-shot Train/Valideation/Test are 93.12, 91.44, 82.06

xiepuzhao avatar Jan 26 '21 10:01 xiepuzhao

@xiepuzhao Thanks for the issue, we'll take a look. What are the numbers you're getting?

ledw avatar Jan 27 '21 04:01 ledw

@ledw Thank you very much for your reply! The Recall@64 on Zero-shot Train/Validation/Test are 93.12, 91.44, 82.06 in your paper, but I got 0.9614, 0.8742, 0.7781.

xiepuzhao avatar Jan 27 '21 12:01 xiepuzhao

I met the same problem with @xiepuzhao. Is this because the step of knowledge distillation is not included in train_biencoder.py?

horseee avatar Feb 06 '21 07:02 horseee

@ledw @horseee Do you get any solutions?

leezythu avatar Mar 09 '21 00:03 leezythu

@leezythu In my previous experiment, I set gradient_accumulation_steps to 8. However, batch size is a particularly important hyper-parameter for this experiment, and if the gradient accumulation is used, then the batch_size should be multiplied by this.

Here is my experimental results: train_batch_size: 128, gradient_accumulation_steps:8, test recall@64: 0.772 train_batch_size: 128, gradient_accumulation_steps:4, test recall@64: 0.789 train_batch_size: 128, gradient_accumulation_steps:1, test recall@64: 0.803 train_batch_size: 192, gradient_accumulation_steps:1, test recall@64: 0.809

80.3 is somewhat lower than 82.06, but it is reasonable for that I didn't incorporate Knowledge Distillation and Negative Sampling in.

Command for Training: python blink/biencoder/train_biencoder.py --data_path data/zeshel/blink_format --output_path models/ --learning_rate 1e-05 --num_train_epochs 5 --max_context_length 128 --max_cand_length 128 --train_batch_size 128 --eval_batch_size 32 --bert_model bert-base --type_optimization all_encoder_layers --data_parallel

horseee avatar Mar 09 '21 02:03 horseee

@horseee Thank you for your reply! I will try that.

leezythu avatar Mar 09 '21 02:03 leezythu

My result is 0.9574 0.8939 0.8035

wutaiqiang avatar May 25 '21 11:05 wutaiqiang

The Recall@64 on Zero-shot Train/Validation/Test

wutaiqiang avatar May 25 '21 11:05 wutaiqiang

@namnam3000 You can follow the instructions here

leezythu avatar Jun 14 '21 08:06 leezythu

I find a small bug in "load_entity_dict_zeshel" function. The doc_list.append(text[:256]) only considers a small length of entity's description. So it should be replaced by doc_list.append(" ".join(text.split()[:128])), which can raise the recall.

leezythu avatar Jul 30 '21 09:07 leezythu