task-transferability
task-transferability copied to clipboard
Data and code for our paper "Exploring and Predicting Transferability across NLP Tasks", to appear at EMNLP 2020.
Task transferability
Note: This repository is a work-in-progress.
Data and code for our paper Exploring and Predicting Transferability across NLP Tasks, to appear at EMNLP 2020.
Figure 1: A demonstration of our task embedding pipeline. Given a target task, we first compute its task embedding and then identify the most similar source task embedding (in this example, WikiHop) from a precomputed library via cosine similarity. Finally, we perform intermediate-task transfer from the selected source task to the target task.
Table of Contents
- Installation
- Data
-
Fine-tuning BERT on downstream NLP tasks
- Fine-tuning BERT on text classification/regression (CR) tasks
- Fine-tuning BERT on question answering (QA) tasks
- Fine-tuning BERT on sequence labeling (SL) tasks
-
Intermediate-task transfer with BERT
- Intermediate-task transfer to text classification/regression (CR) tasks
- Intermediate-task transfer to question answering (QA) tasks
- Intermediate-task transfer to sequence labeling (SL) tasks
-
TextEmb
- Computing TextEmb for text classification/regression (CR) tasks
- Computing TextEmb for question answering (QA) tasks
- Computing TextEmb for sequence labeling (SL) tasks
-
TaskEmb
- Computing TaskEmb for text classification/regression (CR) tasks
- Computing TaskEmb for question answering (QA) tasks
- Computing TaskEmb for sequence labeling (SL) tasks
- Pretrained models and precomputed task embeddings
- Using task embeddings for source task selection
Installation
This repository is developed on Python 3.7.5, PyTorch 1.4.0 and the Hugging Face's Transformers 2.1.1.
You should install all necessary Python packages in a virtual environment. If you are unfamiliar with Python virtual environments, please check out the user guide here.
Below, we create a virtual environment and activate it:
conda create -n task-transferability python=3.7
conda activate task-transferability
Now, install all necessary packages:
cd task-transferability
pip install -r requirements.txt
pip install [--editable] .
Data
You can download the classification/regression (CR) and question answering (QA) datasets we study at this Google Drive link. Unfortunately, most (if not all) of the sequence labeling (SL) datasets used in our paper are not publicly available; more details can be found here.
Fine-tuning BERT on downstream NLP tasks
We have separate code for fine-tuning BERT on each class of problems, including text classification/regression, question answering, and sequence labeling. If you are interested in writing a common API for fine-tuning BERT on different classes of problems, check out a homework I designed for our Advanced NLP class at this Google Colab link (see the function do_target_task_finetuning()
)!
Fine-tuning BERT on text classification/regression (CR) tasks
The following example code fine-tunes BERT on the MRPC task:
export SEED_ID=42
export DATA_DIR=/path/to/MRPC/data/dir
export MODEL_TYPE=bert
export MODEL_NAME_OR_PATH=bert-base-uncased
export TASK_NAME=MRPC
export CACHE_DIR=/path/to/cache/dir
export FINETUNE_FEATURE_EXTRACTOR=True # set to False to freeze BERT and train the output layer only
export SAVE=all # model checkpoints to save; can be a specific checkpoint, e.g., 0, 1,.., num_train_epochs
export OUTPUT_DIR=path/to/output/dir
python ./run_finetuning_CR.py \
--model_type ${MODEL_TYPE} \
--model_name_or_path ${MODEL_NAME_OR_PATH} \
--task_name ${TASK_NAME} \
--do_train \
--do_eval \
--do_lower_case \
--finetune_feature_extractor ${FINETUNE_FEATURE_EXTRACTOR}\
--data_dir ${DATA_DIR} \
--max_seq_length 128 \
--per_gpu_eval_batch_size=8 \
--per_gpu_train_batch_size=32 \
--learning_rate 2e-5 \
--num_train_epochs 3 \
--output_dir ${OUTPUT_DIR} \
--cache_dir ${CACHE_DIR} \
--save ${SAVE} \
--seed ${SEED_ID}
Fine-tuning BERT on question answering (QA) tasks
Here is an example that fine-tunes BERT on the SQuAD-2 task:
export SEED_ID=42
export DATA_DIR=/path/to/SQuAD-2/data/dir
export MODEL_TYPE=bert
export MODEL_NAME_OR_PATH=bert-base-uncased
export VERSION_2_WITH_NEGATIVE=True
export CACHE_DIR=/path/to/cache/dir
export FINETUNE_FEATURE_EXTRACTOR=True # set to False to freeze BERT and train the output layer only
export SAVE=all # model checkpoints to save; can be a specific checkpoint, e.g., 0, 1,.., num_train_epochs
export OUTPUT_DIR=/path/to/output/dir
python ./run_finetuning_QA.py \
--model_type ${MODEL_TYPE} \
--model_name_or_path ${MODEL_NAME_OR_PATH} \
--do_train \
--do_eval \
--do_lower_case \
--version_2_with_negative ${VERSION_2_WITH_NEGATIVE} \
--finetune_feature_extractor ${FINETUNE_FEATURE_EXTRACTOR} \
--train_file ${DATA_DIR}/train.json \
--predict_file ${DATA_DIR}/dev.json \
--max_seq_length 384 \
--per_gpu_eval_batch_size=8 \
--per_gpu_train_batch_size=12 \
--learning_rate 3e-5 \
--num_train_epochs 3 \
--doc_stride 128 \
--output_dir ${OUTPUT_DIR} \
--cache_dir ${CACHE_DIR} \
--save ${SAVE} \
--seed ${SEED_ID}
Fine-tuning BERT on sequence labeling (SL) tasks
This example code fine-tunes BERT on the NER task:
export SEED_ID=42
export DATA_DIR=/path/to/NER/data/dir
export MODEL_TYPE=bert
export MODEL_NAME_OR_PATH=bert-base-uncased
export LABEL_FILE=${DATA_DIR}/labels.txt
export CACHE_DIR=/path/to/cache/dir
export FINETUNE_FEATURE_EXTRACTOR=True # set to False to freeze BERT and train the output layer only
export SAVE=all # model checkpoints to save; can be a specific checkpoint, e.g., 0, 1,.., num_train_epochs
export OUTPUT_DIR=/path/to/output/dir
python ./run_finetuning_SL.py \
--model_type ${MODEL_TYPE} \
--model_name_or_path ${MODEL_NAME_OR_PATH} \
--do_train \
--do_eval \
--do_lower_case \
--labels ${LABEL_FILE} \
--finetune_feature_extractor ${FINETUNE_FEATURE_EXTRACTOR} \
--data_dir ${DATA_DIR} \
--max_seq_length 128 \
--per_gpu_eval_batch_size=8 \
--per_gpu_train_batch_size=32 \
--learning_rate 5e-5 \
--num_train_epochs 6 \
--output_dir ${OUTPUT_DIR} \
--cache_dir ${CACHE_DIR} \
--save ${SAVE} \
--seed ${SEED_ID}
Note: You might want to use a Cased
model if case information is important for your task (e.g., named entity recognition or part-of-speech tagging).
Intermediate-task transfer with BERT
Since the target task is typically different than the source task, we will discard the output layer from the model fine-tuned on the source task and incorporate BERT with another output layer to form a task-specific model for the target task. The output layer will be learned from scratch. You can do this by running our fine-tuning code with the argument model_load_mode = "base_model_only"
.
Figure 2: Our experiments show that even low-data source tasks that are on the surface very different than the target task can result in transfer gains. Out-of-class transfer (e.g., from a question answering task to a classification task) succeeds in many cases, some of which are unintuitive. Here in this violin plot, each point in a violin represents the performance on the associated target task when we perform intermediate-task transfer with a particular source task and the black line corresponds to what happens when we directly fine-tune BERT on the target task without any intermediate-task transfer.
Intermediate-task transfer to text classification/regression (CR) tasks
This example code performs intermediate-task transfer from the HotpotQA task (question answering) to the MRPC task (text classification/regression):
export SEED_ID=42
export TARGET_DATA_DIR=/path/to/MRPC/data/dir
export MODEL_TYPE=bert
export TARGET_TASK=MRPC
export CACHE_DIR=/path/to/cache/dir
export FINETUNE_FEATURE_EXTRACTOR=True
export MODEL_LOAD_MODE=base_model_only
export SAVE=all
export SOURCE_CHECKPOINT=3
export MODEL_NAME_OR_PATH=/path/to/BERT/HotpotQA/model/checkpoint-${SOURCE_CHECKPOINT}
export OUTPUT_DIR=/path/to/output/dir
python ./run_finetuning_CR.py \
--model_type ${MODEL_TYPE} \
--model_name_or_path ${MODEL_NAME_OR_PATH} \
--task_name ${TARGET_TASK} \
--do_train \
--do_eval \
--do_lower_case \
--finetune_feature_extractor ${FINETUNE_FEATURE_EXTRACTOR} \
--model_load_mode ${MODEL_LOAD_MODE} \
--data_dir ${TARGET_DATA_DIR} \
--max_seq_length 128 \
--per_gpu_eval_batch_size=8 \
--per_gpu_train_batch_size=32 \
--learning_rate 2e-5 \
--num_train_epochs 3 \
--output_dir ${OUTPUT_DIR} \
--cache_dir ${CACHE_DIR} \
--save ${SAVE} \
--seed ${SEED_ID}
Intermediate-task transfer to question answering (QA) tasks
The following script does intermediate-task transfer from the GParent task (sequence labeling) to the CQ task (question answering):
export SEED_ID=42
export TARGET_DATA_DIR=/path/to/CQ/data/dir
export MODEL_TYPE=bert
export VERSION_2_WITH_NEGATIVE=False
export CACHE_DIR=/path/to/cache/dir
export FINETUNE_FEATURE_EXTRACTOR=True
export MODEL_LOAD_MODE=base_model_only
export SAVE=all
export SOURCE_CHECKPOINT=6
export MODEL_NAME_OR_PATH=/path/to/BERT/GParent/model/checkpoint-${SOURCE_CHECKPOINT}
export OUTPUT_DIR=/path/to/output/dir
python ./run_finetuning_QA.py \
--model_type ${MODEL_TYPE} \
--model_name_or_path ${MODEL_NAME_OR_PATH} \
--do_train \
--do_eval \
--do_lower_case \
--version_2_with_negative ${VERSION_2_WITH_NEGATIVE} \
--finetune_feature_extractor ${FINETUNE_FEATURE_EXTRACTOR} \
--model_load_mode ${MODEL_LOAD_MODE} \
--train_file ${TARGET_DATA_DIR}/train.json \
--predict_file ${TARGET_DATA_DIR}/dev.json \
--max_seq_length 384 \
--per_gpu_eval_batch_size=8 \
--per_gpu_train_batch_size=12 \
--learning_rate 3e-5 \
--num_train_epochs 3 \
--doc_stride 128 \
--output_dir ${OUTPUT_DIR} \
--cache_dir ${CACHE_DIR} \
--save ${SAVE}
Intermediate-task transfer to sequence labeling (SL) tasks
Below, we perform intermediate-task transfer from the MNLI task (text classification/regression) to the Conj task (sequence labeling):
export SEED_ID=42
export TARGET_DATA_DIR=/path/to/Conj/data/dir
export MODEL_TYPE=bert
export LABEL_FILE=${TARGET_DATA_DIR}/labels.txt
export CACHE_DIR=/path/to/cache/dir
export FINETUNE_FEATURE_EXTRACTOR=True
export MODEL_LOAD_MODE=base_model_only
export SAVE=all
export SOURCE_CHECKPOINT=3
export MODEL_NAME_OR_PATH=/path/to/BERT/MNLI/model/checkpoint-${SOURCE_CHECKPOINT}
export OUTPUT_DIR=/path/to/output/dir
python ./run_finetuning_SL.py \
--model_type ${MODEL_TYPE} \
--model_name_or_path ${MODEL_NAME_OR_PATH} \
--do_train \
--do_eval \
--do_lower_case \
--labels ${LABEL_FILE}\
--finetune_feature_extractor ${FINETUNE_FEATURE_EXTRACTOR} \
--model_load_mode ${MODEL_LOAD_MODE} \
--data_dir ${TARGET_DATA_DIR} \
--max_seq_length 128 \
--per_gpu_eval_batch_size=8 \
--per_gpu_train_batch_size=32 \
--learning_rate 5e-5 \
--num_train_epochs 6 \
--output_dir ${OUTPUT_DIR} \
--cache_dir ${CACHE_DIR} \
--save ${SAVE}
TextEmb
TextEmb
is computed by pooling BERT’s final-layer token-level representations across an entire dataset, and as such captures properties of the text and domain. This method does not depend on the training labels and thus can be used in zero-shot transfer.
Figure 3a: A visualization of the task space of TextEmb
. It captures domain similarity (e.g., the Penn Treebank sequence labeling tasks are highly interconnected).
Computing TextEmb for text classification/regression (CR) tasks
You can run something like the following script to compute TextEmb
for the MRPC task:
export SEED_ID=42
export DATA_DIR=/path/to/MRPC/data/dir
export MODEL_TYPE=bert
export MODEL_NAME_OR_PATH=bert-base-uncased
export TASK_NAME=MRPC
export CACHE_DIR=/path/to/cache/dir
export OUTPUT_DIR=/path/to/output/dir
python run_textemb_CR.py \
--model_type ${MODEL_TYPE} \
--model_name_or_path ${MODEL_NAME_OR_PATH} \
--task_name ${TASK_NAME} \
--do_lower_case \
--data_dir ${DATA_DIR} \
--max_seq_length 128 \
--per_gpu_train_batch_size=32 \
--num_train_epochs 1 \
--output_dir ${OUTPUT_DIR} \
--cache_dir ${CACHE_DIR} \
--seed ${SEED_ID}
Let's look at the TextEmb
task embedding:
import numpy as np
task_emb = np.load("/path/to/output/dir/avg_sequence_output.npy").reshape(-1)
print(task_emb.shape) # (768, )
print(task_emb)
Computing TextEmb for question answering (QA) tasks
This example code computes TextEmb
for the SQuAD-2 task:
export SEED_ID=42
export DATA_DIR=/path/to/SQuAD-2/data/dir
export MODEL_TYPE=bert
export MODEL_NAME_OR_PATH=bert-base-uncased
export VERSION_2_WITH_NEGATIVE=True
export CACHE_DIR=/path/to/cache/dir
export OUTPUT_DIR=/path/to/output/dir
python ./run_textemb_QA.py \
--model_type ${MODEL_TYPE} \
--model_name_or_path ${MODEL_NAME_OR_PATH} \
--do_lower_case \
--version_2_with_negative ${VERSION_2_WITH_NEGATIVE} \
--train_file ${DATA_DIR}/train.json \
--max_seq_length 384 \
--per_gpu_train_batch_size=12 \
--num_train_epochs 1 \
--doc_stride 128 \
--output_dir ${OUTPUT_DIR} \
--cache_dir ${CACHE_DIR} \
--seed ${SEED_ID}
Computing TextEmb for sequence labeling (SL) tasks
Here is an example that computes TextEmb
for the NER task:
export SEED_ID=42
export DATA_DIR=/path/to/NER/data/dir
export MODEL_TYPE=bert
export MODEL_NAME_OR_PATH=bert-base-uncased
export LABEL_FILE=${DATA_DIR}/labels.txt
export CACHE_DIR=/path/to/cache/dir
export OUTPUT_DIR=/path/to/output/dir
python ./run_textemb_SL.py \
--model_type ${MODEL_TYPE} \
--model_name_or_path ${MODEL_NAME_OR_PATH} \
--do_lower_case \
--labels ${LABEL_FILE}\
--data_dir ${DATA_DIR} \
--max_seq_length 128 \
--per_gpu_train_batch_size=32 \
--num_train_epochs 1 \
--output_dir ${OUTPUT_DIR} \
--cache_dir ${CACHE_DIR} \
--seed ${SEED_ID}
TaskEmb
TaskEmb
relies on the correlation between the fine-tuning loss function and the parameters of BERT, and encodes more information about the type of knowledge and reasoning required to solve the task. It is computationally more expensive than TextEmb
since you need to fine-tune BERT. If you have enough training data for your task, you can try freezing BERT during fine-tuning and train the output layer only. We find empirically that TaskEmb
computed from a fine-tuned task-specific BERT result in better correlations to task transferability in data-constrained scenarios.
Figure 3b: A visualization of the task space of TaskEmb
. It focuses more on task similarity (e.g., the two part-of-speech tagging tasks are interconnected despite their domain dissimilarity).
Computing TaskEmb for text classification/regression (CR) tasks
In this example, we will compute TaskEmb
for the MRPC task. We assume that you have fine-tuned BERT on MRPC already.
Use the following script to compute TaskEmb
:
export SEED_ID=42
export DATA_DIR=/path/to/MRPC/data/dir
export MODEL_TYPE=bert
export TASK_NAME=MRPC
export USE_LABELS=True # set to False to sample from the model's predictive distribution
export CACHE_DIR=/path/to/cache/dir
export CHECKPOINT=3
export MODEL_NAME_OR_PATH=/path/to/BERT/MRPC/model/checkpoint-${CHECKPOINT}
# we start from a fine-tuned task-specific BERT so no need for further fine-tuning
export FURTHER_FINETUNE_CLASSIFIER=False
export FURTHER_FINETUNE_FEATURE_EXTRACTOR=False
export OUTPUT_DIR=/path/to/output/dir
python ./run_taskemb_CR.py \
--model_type ${MODEL_TYPE} \
--model_name_or_path ${MODEL_NAME_OR_PATH} \
--task_name ${TASK_NAME} \
--use_labels ${USE_LABELS} \
--do_lower_case \
--finetune_classifier ${FURTHER_FINETUNE_CLASSIFIER} \
--finetune_feature_extractor ${FURTHER_FINETUNE_FEATURE_EXTRACTOR} \
--data_dir ${DATA_DIR} \
--max_seq_length 128 \
--batch_size=32 \
--learning_rate 2e-5 \
--num_epochs 1 \
--output_dir ${OUTPUT_DIR} \
--cache_dir ${CACHE_DIR} \
--seed ${SEED_ID}
Let's look at the TaskEmb
task embedding computed from a specific component of BERT, i.e., the CLS
output vector:
import numpy as np
task_emb = np.load("/path/to/output/dir/cls_output.npy").reshape(-1)
print(task_emb.shape) # (768, )
print(task_emb)
Computing TaskEmb for question answering (QA) tasks
Let's compute TaskEmb
for the SQuAD-2 task. We assume that you have fine-tuned BERT on SQuAD-2 already.
Run the following script to compute TaskEmb
:
export SEED_ID=42
export DATA_DIR=/path/to/SQuAD-2/data/dir
export MODEL_TYPE=bert
export VERSION_2_WITH_NEGATIVE=True
export USE_LABELS=True # set to False to sample from the model's predictive distribution
export CACHE_DIR=/path/to/cache/dir
export CHECKPOINT=3
export MODEL_NAME_OR_PATH=/path/to/BERT/SQuAD-2/model/checkpoint-${CHECKPOINT}
# we start from a fine-tuned task-specific BERT so no need for further fine-tuning
export FURTHER_FINETUNE_CLASSIFIER=False
export FURTHER_FINETUNE_FEATURE_EXTRACTOR=False
export OUTPUT_DIR=/path/to/output/dir
python ./run_taskemb_QA.py \
--model_type ${MODEL_TYPE} \
--model_name_or_path ${MODEL_NAME_OR_PATH} \
--use_labels ${USE_LABELS} \
--do_lower_case \
--finetune_classifier ${FURTHER_FINETUNE_CLASSIFIER} \
--finetune_feature_extractor ${FURTHER_FINETUNE_FEATURE_EXTRACTOR} \
--train_file ${DATA_DIR}/train.json \
--version_2_with_negative ${VERSION_2_WITH_NEGATIVE} \
--max_seq_length 384 \
--batch_size=12 \
--learning_rate 3e-5 \
--num_epochs 1 \
--doc_stride 128 \
--output_dir ${OUTPUT_DIR} \
--cache_dir ${CACHE_DIR} \
--seed ${SEED_ID}
Computing TaskEmb for sequence labeling (SL) tasks
This example code computes TaskEmb
for the NER task, which assumes that you have fine-tuned BERT on NER already:
export SEED_ID=42
export DATA_DIR=/path/to/NER/data/dir
export MODEL_TYPE=bert
export LABEL_FILE=${DATA_DIR}/labels.txt
export USE_LABELS=True # set to False to sample from the model's predictive distribution
export CACHE_DIR=/path/to/cache/dir
export CHECKPOINT=6
export MODEL_NAME_OR_PATH=/path/to/BERT/NER/model/checkpoint-${CHECKPOINT}
# we start from a fine-tuned task-specific BERT so no need for further fine-tuning
export FURTHER_FINETUNE_CLASSIFIER=False
export FURTHER_FINETUNE_FEATURE_EXTRACTOR=False
export OUTPUT_DIR=/path/to/output/dir
python ./run_taskemb_CR.py \
--model_type ${MODEL_TYPE} \
--model_name_or_path ${MODEL_NAME_OR_PATH} \
--labels ${LABEL_FILE} \
--use_labels ${USE_LABELS} \
--do_lower_case \
--finetune_classifier ${FURTHER_FINETUNE_CLASSIFIER} \
--finetune_feature_extractor ${FURTHER_FINETUNE_FEATURE_EXTRACTOR} \
--data_dir ${DATA_DIR} \
--max_seq_length 128 \
--batch_size=32 \
--learning_rate 5e-5 \
--num_epochs 1 \
--output_dir ${OUTPUT_DIR} \
--cache_dir ${CACHE_DIR} \
--seed ${SEED_ID}
Pretrained models and precomputed task embeddings
Using task embeddings for source task selection
This notebook demonstrates using the TextEmb
task embedding for source task selection.
Citation
If you find this repository useful, please cite our paper:
@inproceedings{vu-etal-2020-task-transferability,
title = "Exploring and Predicting Transferability across {NLP} Tasks",
author = "Vu, Tu and
Wang, Tong and
Munkhdalai, Tsendsuren and
Sordoni, Alessandro and
Trischler, Adam and
Mattarella-Micke, Andrew and
Maji, Subhransu and
Iyyer, Mohit",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.emnlp-main.635",
pages = "7882--7926",
}