pytorch-openai-transformer-lm
pytorch-openai-transformer-lm copied to clipboard
Having various network heads
Hi!
In the research paper, the authors tackle many different problems using the same base architecture, it is one the main strength of this article. Unfortunately, the actual version of the code only allows to work with multiple choices tasks such as ROCStories.
This is what I would like to fix in a future patch. By providing multiple model heads dedicated to other tasks that multiple choices problems, we can allow a lot more people to use this code.
I have already started working on this and I would like to get your opinions on a few design choices.
This is the new version of the DoubleHeadModel class:
class DoubleHeadModel(nn.Module):
""" Transformer with language model and task specific heads """
def __init__(self, cfg, clf_token, task_head_type, vocab=40990, n_ctx=512):
super(DoubleHeadModel, self).__init__()
self.transformer = TransformerModel(cfg, vocab=vocab, n_ctx=n_ctx)
self.lm_head = LMHead(self.transformer, cfg)
if isinstance(task_head_type, str):
if task_head_type == 'multiple_choice':
self.task_head = MultipleChoiceHead(clf_token, cfg)
elif task_head_type == 'similarity':
self.task_head = SimilarityHead(clf_token, cfg)
elif task_head_type == 'inference':
# the three classes correspond to entailment, contradiction and neutral.
self.task_head = ClfHead(clf_token, cfg, 3)
else:
raise ValueError("task_head_type is expected to be 'multiple_choice' "
"'similarity', 'inference' or ('classification', n_class) "
f"got {task_head_type}.")
elif isinstance(task_head_type, collections.abc.Sequence) and len(task_head_type) == 2 and \
task_head_type[0] == 'classification':
n_class = task_head_type[1]
self.task_head = ClfHead(clf_token, cfg, n_class)
else:
raise ValueError("task_head_type is expected to be 'multiple_choice' "
"'similarity', 'inference' or ('classification', n_class) "
f"got {task_head_type}.")
def forward(self, x):
h = self.transformer(x)
lm_logits = self.lm_head(h)
task_logits = self.task_head(h, x)
return lm_logits, task_logits
The __init__ method takes a new argument task_head_type which can be one of the following things:
"multiple_choice"for multiple choice problems (corresponds to currentClfHead) such as ROCStories."similarity"for similarity tasks such Quora Question Pairs (QQP) and the Semantic Textual Similarity benchmark (STS-B)."inference"for Natural Language Inference (NLI) tasks such as SNLI, QNLI and MNLI. Inference problems are treated as classification problems with 3 classes: entailment, contradiction and neutral.("classification", n_class)for classification tasks such as the Corpus of Linguistic Acceptability (CoLA) and the Stanford Sentiment Treebank (SST-2).
The code for the various heads is the following:
class MultipleChoiceHead(nn.Module):
""" Multiple Choice Head for the transformer """
def __init__(self, clf_token, cfg):
super(MultipleChoiceHead, self).__init__()
self.n_embd = cfg.n_embd
self.clf_token = clf_token
self.dropout = nn.Dropout2d(cfg.clf_pdrop)
self.linear = nn.Linear(cfg.n_embd, 1)
nn.init.normal_(self.linear.weight, std = 0.02)
nn.init.normal_(self.linear.bias, 0)
def forward(self, h, x):
# Classification logits
clf_h = h.view(-1, self.n_embd)
flat = x[..., 0].contiguous().view(-1)
clf_h = clf_h[flat == self.clf_token, :]
clf_h = clf_h.view(-1, x.size(1), self.n_embd, 1)
clf_h = self.dropout(clf_h.transpose(1, 2)).transpose(1, 2)
clf_h = clf_h.contiguous().view(-1, self.n_embd)
clf_logits = self.linear(clf_h)
return clf_logits.view(-1, x.size(1))
class ClfHead(nn.Module):
"""Classification Head for the transformer """
def __init__(self, clf_token, cfg, n_class):
super(ClfHead, self).__init__()
self.n_embd = cfg.n_embd
self.clf_token = clf_token
self.dropout = nn.Dropout(cfg.clf_pdrop)
self.linear = nn.Linear(cfg.n_embd, n_class)
nn.init.normal_(self.linear.weight, std = 0.02)
nn.init.normal_(self.linear.bias, 0)
def forward(self, h, x):
clf_h = h.view(-1, self.n_embd)
flat = x[..., 0].contiguous().view(-1)
clf_h = clf_h[flat == self.clf_token, :]
clf_h = self.dropout(clf_h)
clf_logits = self.linear(clf_h)
return clf_logits
class SimilarityHead(nn.Module):
""" Similarity Head for the transformer """
def __init__(self, clf_token, cfg):
super(SimilarityHead, self).__init__()
self.n_embd = cfg.n_embd
self.clf_token = clf_token
self.dropout = nn.Dropout(cfg.clf_pdrop)
self.linear = nn.Linear(cfg_n_embd, 1)
nn.init.normal_(self.linear.weight, std = 0.02)
nn.init.normal_(self.linear.bias, 0)
def forward(self, h, x):
sim_h = h.view(-1, self.n_embd)
flat = x[..., 0].contiguous().view(-1)
sim_h = sim_h[flat == self.clf_token, :]
sim_h = self.dropout(sim_h)
sim_h = sim_h.sum(dim = 1)
sim_logits = self.linear(sim_h)
return sim_logits
Do you think that this new design is reasonable?
If this code seems ok, I would like to test it before creating a pull request. Unfortunately I will not have the time to test SimilarityHead. Would anyone like to work with me on this ?
Look good to me! I will merge your PR.
I can help you test the SimilarityHead, but not before the end of August, so if someone want to tackle this question during the summer, please do!
There are a few discussion related to this on OpenAI's repo that are probably worth following:
- https://github.com/openai/finetune-transformer-lm/issues/11
- https://github.com/openai/finetune-transformer-lm/issues/13