gpt-2-output-dataset
gpt-2-output-dataset copied to clipboard
Questions about the meaning of data set attribute representation
About your dataset, does the "length" attribute represent the length of the "text" attribute? Or something else? I don't think it means the length of the "text" attribute, for example, in the file "medium-345m-k40 train.jsonl ”"Length" = 1024, but I calculated the length of text is equal to 4750, so I want to know the meaning of "length" attribute. I look forward to your reply. Thank you very much.
If you're referring to the length parameter as per this:
def interact_model(
model_name='345M', #345M/774M on Pi4B 8G only (memory allocation issue) 1558 too big for Pi4b8G
seed=None,
nsamples=1,
batch_size=1,
length=140,
temperature=1.2,
top_k=48,
top_p=0.7,
models_dir='models',
):
Then length refers to the maximum number of words the output will contain. I keep mine short & sweet at 140 max length because I use GPT-2 for my robots for a conversational response. But if you want it to write an article, it certainly can...
First of all, thank you very much for your reply, but I still don't understand. I can understand that 1024 is the maximum length. I understand the "text" attribute as the text generated by gpt-2. I'm not sure if my understanding is correct? If correct, the "length" attribute should be equal to the length of "text". In the dataset you provided, I calculated the length of the "text" attribute, but it is not equal to the given value of the "length" attribute, so I want to know what the "length" attribute stands for?Looking forward to your help and reply.