pytorch-openai-transformer-lm
pytorch-openai-transformer-lm copied to clipboard
Can someone explain this line?
If my understanding is correct this is finding the places where there is delimiter and filters for them. How does this help with training?
https://github.com/huggingface/pytorch-openai-transformer-lm/blob/253ca422bbf94b19da2a4aa8f1b294e01ab8be37/model_pytorch.py#L207
When the information reaches the classification head, it has one vector of dimension n_embd
associated to each position of each input. If you want to get a single prediction for each input (as it is the case with classification tasks) you have to select one of these input.
As the transformer network is auto-regressive, the value you select has to be the rightmost one which corresponds to clf_token
in the input as it is created like this:
x12 = [start] + x1[:max_len] + [delimiter] + x2[:max_len] + [clf_token]
x13 = [start] + x1[:max_len] + [delimiter] + x3[:max_len] + [clf_token]
@rodgzilla Thank you a lot for the explanation. It makes a lot of sense! Out of curiosity, why all the values cannot be used?
Well for a classifier, we usually want a fixed length representation of the sentence so we can't really use a varying number of values. Starting from that, the last hidden state is the most logical summary of the sentence. But there are other possible options of course, feel free to try your ideas!
in original open ai code (https://github.com/openai/finetune-transformer-lm/blob/bd1cf7d678926041e6d19193cab7e5cd8ce2fce6/train.py#L191) in train.py
in the model function here in this line clf_logits = clf(clf_h, 1, train=train)
, why ny
is 1?, shouldn't it be 2? because we have two classes. is there a reason to use 1 and then later reshape the logits second dimension to 2?! I really appreciate your help,