pytorch-openai-transformer-lm
pytorch-openai-transformer-lm copied to clipboard
π₯A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI
I adapted this model to a text classification problem, where my text is concated as: [start] text1 [delimiter] text2 [delimiter] text3 [classify] and it is just a binary classification problem....
Similarity Head and Loss function were tested on the STS-B dataset, achieving nearly the same performance as reported (82.45% PC relative to the 82% in the paper). I can provide...
I'd like to use GPT to encode my dataset and use the representations further for the task of question generation. I have problems with understanding the code and the name...
I am trying to use this repository to train a language model with an additional input. My data looks like this: ``` βββββββββββ¬ββββββ¬βββββ¬ββββ βside infoβstartβThe βcatβ βββββββββββ΄ββββββ΄βββββ΄ββββ ``` The labels...
Hi all, I am trying to train a new dataset with a similar structure to rocstories. It has a story part, 2 options and one correct option. I just added...
Give the transformation method for ROC stories dataset, which is ``` def transform_roc(X1, X2, X3): n_batch = len(X1) xmb = np.zeros((n_batch, 2, n_ctx, 2), dtype=np.int32) mmb = np.zeros((n_batch, 2, n_ctx),...
I know that `n_vocab` is the total number of tokens in encoder dictionary. But when I saw `vocab = n_vocab + n_special + n_ctx`, I was confused, maybe `n_special `...
From the [ConvAI slides](convai.io), it sounds like the Hugging Face submission was based off of this model -- is the code for your ConvAI system available somewhere to take a...
I am confused by the code below. https://github.com/huggingface/pytorch-openai-transformer-lm/blob/eafc28abdfadfa0732f03a0fc65805c5bfb2ffe7/train.py#L52 https://github.com/huggingface/pytorch-openai-transformer-lm/blob/eafc28abdfadfa0732f03a0fc65805c5bfb2ffe7/train.py#L54 Is this due to any normalization? Thanks!
Just curious what would be the place to start to create a seq2seq for response generation on say the persona-chat dataset