BERT-of-Theseus
BERT-of-Theseus copied to clipboard
⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).
Hi, I cannot reproduce the CoLA score as same as the one on paper. I followed the HuggingFace's repo to train a predecessor model with Matthew correlation score of 55.76....
After training like that: ``` # For compression with a replacement scheduler export GLUE_DIR=glue_script/glue_data export TASK_NAME=MRPC python ./run_glue.py \ --model_name_or_path /home/bert-base \ --task_name $TASK_NAME \ --do_train \ --do_eval \ --do_lower_case...
What does “max_length” mean in config.json of successor? I set max_seq_length=128 when I Run compression, but the "max_length" in config.json of successor is 20. 
Bumps [transformers](https://github.com/huggingface/transformers) from 2.4.0 to 4.30.0. Release notes Sourced from transformers's releases. v4.30.0: 100k, Agents improvements, Safetensors core dependency, Swiftformer, Autoformer, MobileViTv2, timm-as-a-backbone 100k Transformers has just reached 100k stars...