pytorch-openai-transformer-lm icon indicating copy to clipboard operation
pytorch-openai-transformer-lm copied to clipboard

How to create transforms for entailment task?

Open lordzuko opened this issue 6 years ago • 12 comments

Give the transformation method for ROC stories dataset, which is

def transform_roc(X1, X2, X3):
    n_batch = len(X1)
    xmb = np.zeros((n_batch, 2, n_ctx, 2), dtype=np.int32)
    mmb = np.zeros((n_batch, 2, n_ctx), dtype=np.float32)
    start = encoder['_start_']
    delimiter = encoder['_delimiter_']
    for i, (x1, x2, x3), in enumerate(zip(X1, X2, X3)):
        x12 = [start] + x1[:max_len] + [delimiter] + x2[:max_len] + [clf_token]
        x13 = [start] + x1[:max_len] + [delimiter] + x3[:max_len] + [clf_token]
        l12 = len(x12)
        l13 = len(x13)
        xmb[i, 0, :l12, 0] = x12
        xmb[i, 1, :l13, 0] = x13
        mmb[i, 0, :l12] = 1
        mmb[i, 1, :l13] = 1
    # Position information that is added to the input embeddings in the TransformerModel
    xmb[:, :, :, 1] = np.arange(n_vocab + n_special, n_vocab + n_special + n_ctx)
    return xmb, mmb

I have created following transforms for entailment task:

def transform_entailment(X1, X2):
    n_batch = len(X1)
    xmb = np.zeros((n_batch, 1, n_ctx, 2), dtype=np.int32)
    mmb = np.zeros((n_batch, 1, n_ctx), dtype=np.float32)
    start = encoder['_start_']
    delimiter = encoder['_delimiter_']
    for i, (x1, x2), in enumerate(zip(X1, X2)):
        x12 = [start] + x1[:max_len] + [delimiter] + x2[:max_len] + [clf_token]
        l12 = len(x12)
        xmb[i, 0, :l12, 0] = x12
        mmb[i, 0, :l12] = 1

    # Position information that is added to the input embeddings in the TransformerModel
    xmb[:, :, :, 1] = np.arange(n_vocab + n_special, n_vocab + n_special + n_ctx)
    return xmb, mmb

Using this I am getting following error during loss computation:

Namespace(afn='gelu', analysis=False, attn_pdrop=0.1, b1=0.9, b2=0.999, bpe_path='data/dataset_tweet_encode/vocab_40000.bpe', clf_pdrop=0.1, data_dir='data/', dataset=None, desc=None, e=1e-08, embd_pdrop=0.1, encoder_path='data/dataset_tweet_encode/encoder_bpe_40000.json', l2=0.01, lm_coef=0.5, log_dir='log/', lr=6.25e-05, lr_schedule='warmup_linear', lr_warmup=0.002, max_grad_norm=1, n_batch=1, n_ctx=512, n_embd=768, n_head=12, n_iter=3, n_layer=12, n_transfer=12, n_valid=0.1, opt='adam', resid_pdrop=0.1, save_dir='save/', seed=42, submission_dir='submission/', submit=False, vector_l2=False)


Traceback (most recent call last):                                              
  File "train.py", line 225, in <module>
    run_epoch()
  File "train.py", line 83, in run_epoch
    compute_loss_fct(XMB, YMB, MMB, clf_logits, lm_logits)
  File "/home/lordzuko/PycharmProjects/Transformer-Pytorch/loss.py", line 53, in __call__
    lm_losses = self.lm_criterion(lm_logits, x_shifted)
  File "/home/lordzuko/envs/pytorch-0.4/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/lordzuko/envs/pytorch-0.4/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 862, in forward
    ignore_index=self.ignore_index, reduction=self.reduction)
  File "/home/lordzuko/envs/pytorch-0.4/lib/python3.6/site-packages/torch/nn/functional.py", line 1550, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/lordzuko/envs/pytorch-0.4/lib/python3.6/site-packages/torch/nn/functional.py", line 1405, in nll_loss
    .format(input.size(0), target.size(0)))
ValueError: Expected input batch_size (66) to match target batch_size (0).

Can anyone please guide me through this ?

lordzuko avatar Sep 09 '18 15:09 lordzuko

Do you use ClassificationLossCompute ? Because I had the same problem when I used it for a classification task, but it turns out it computes the same loss (cross entropy) as MultipleChoiceLossCompute, only the views (reshapes) are different but they are bugged in ClassificationLC. So you just have to replace ClassificationLC with MultipleChoiceLC and it should work.

artemisart avatar Sep 17 '18 08:09 artemisart

@artemisart Yes I was using the same, made the changes you suggested, now its working. Thank you !!

lordzuko avatar Sep 19 '18 09:09 lordzuko

Do you use ClassificationLossCompute ? Because I had the same problem when I used it for a classification task, but it turns out it computes the same loss (cross entropy) as MultipleChoiceLossCompute, only the views (reshapes) are different but they are bugged in ClassificationLC. So you just have to replace ClassificationLC with MultipleChoiceLC and it should work.

Hi @artemisart @lordzuko ! If there is a bug in ClassificationLossCompute do you think it's worth opening an issue specifically on it?

davidefiocco avatar Sep 26 '18 14:09 davidefiocco

 xmb = np.zeros((n_batch, 2, n_ctx, 2), dtype=np.int32)
    mmb = np.zeros((n_batch, 2, n_ctx), dtype=np.float32)

Hi, can anyone tell me what does the 2 mean between n_batch and n_ctx mean?

p-null avatar Oct 25 '18 17:10 p-null

From my understanding, it's the number of texts to be processed in parallel (for each example) by Transformer, so 1 for text classification, 2 for Story Cloze Test (rocstories), n for multi-choice, etc.

artemisart avatar Oct 25 '18 19:10 artemisart

Hi

@artemisart is correct.

rodgzilla avatar Oct 26 '18 06:10 rodgzilla

I upload the openai-gpt for classification task and it can reproduce the result reported in original paper

p-null avatar Oct 29 '18 19:10 p-null

What is the use of mmb?

zhipeng-fan avatar Jan 16 '19 03:01 zhipeng-fan

What is the use of mmb?

I see, it is the mask.

zhipeng-fan avatar Jan 16 '19 03:01 zhipeng-fan

@artemisart Yes I was using the same, made the changes you suggested, now its working. Thank you !!

Hi, I am also trying to use the transformer model for the entailment task and trying to replicate the SNLI results in paper. Could you please let me know what changes you made to the train.py file for the entailment task?

aayushee avatar Mar 14 '19 06:03 aayushee

Do you use ClassificationLossCompute ? Because I had the same problem when I used it for a classification task, but it turns out it computes the same loss (cross entropy) as MultipleChoiceLossCompute, only the views (reshapes) are different but they are bugged in ClassificationLC. So you just have to replace ClassificationLC with MultipleChoiceLC and it should work.

Hi @artemisart @lordzuko ! If there is a bug in ClassificationLossCompute do you think it's worth opening an issue specifically on it?

Hello Davidefiocco,Could you please upload the dataset you have processed? I am also a freshman who wants to see the whole process of running this program. If it is convient of you, you can also send two files to the mailbox [email protected]. Thank you very much.I saw your thoughts and answered, and helped me a lot.

BUPTHYP avatar May 29 '19 02:05 BUPTHYP

@artemisart Yes I was using the same, made the changes you suggested, now its working. Thank you !!

Very THX!Read your conversation with them,help me a lot.

BUPTHYP avatar May 29 '19 02:05 BUPTHYP