are-16-heads-really-better-than-1
are-16-heads-really-better-than-1 copied to clipboard
Code for the paper "Are Sixteen Heads Really Better than One?"
Hi, I am trying to reproduce your result of BERT. I followed the Prerequisite: ``` # Pytorch pretrained BERT git clone https://github.com/pmichel31415/pytorch-pretrained-BERT cd pytorch-pretrained-BERT git checkout paul cd .. ```...
Sorry to bother you. I met a bug druing runing the "heads_pruning.sh", and the error is: 12:21:27-INFO: ***** Running evaluation ***** 12:21:27-INFO: Num examples = 9815 12:21:27-INFO: Batch size =...
Hi ! @pmichel31415 1.In are-16-heads-really-better-than-1/experiments/MT/prune_wmt.sh you have the `--raw-text $EXTRA_OPTIONS,` and I don't know the meaning. Can you tell me its explanation and how to use it? It is the...