Sumanth Dathathri
Sumanth Dathathri
I suspect this might have to do with using a GPT-2 model of the wrong size....
I think one option would be to compute the probability of multiple tokens being generated and use that the same way the single token probability is being used. Let's say...
I think that is actually a bug, and might also explain why our experiments with horizon_length > 1 did not work so well (we use horizon-length=1 in all of our...
@ehsan-soe You can compute the perplexity of the generated text with regard to another language model (GPT), which is what we do here. @Guaguago human_annotation/pplm_labeled_csvs has the generated samples. You...
You can use the 'parse_*.ipynb' notebooks to process the CSVs. That should give you samples from different models separately.
Are your scores in the same range as the paper? 1. The 360 samples are for pairwise A/B testing from the ablation study -- it consists of 6 pairs, so...
``` def score(sentence, tokenizer, model): tokenize_input = tokenizer.tokenize(sentence) tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)]).cuda() loss = model(tensor_input, lm_labels=tensor_input) return math.exp(loss) tokenizer_LM = OpenAIGPTTokenizer.from_pretrained('openai-gpt') model_LM = OpenAIGPTLMHeadModel.from_pretrained('openai-gpt') ``` Yes, the samples are the same....
Hi, Sorry for the late response, it's been hectic. Overall what you are doing seems reasonable to me. 3. We don't actually drop ''. Can you drop me (and Andrea)...
You will get different samples if you change the seed, but that should be OK.
Can you point out of the lines you are commenting out? As long as you're not disabling the KL/GM terms, it should be fine.