Matthew Hayes
Matthew Hayes
I've done some investigation on this. Our pipeline takes the instruction and formats it into a prompt (the same prompt used for training). The prompt is about 23 tokens when...
I'll have to think more about whether we should make any changes to the pipeline or model config. We could compute the max length by doing the math as I...
Are you sure that path exists and contains the model? Usually I’ve seen that when the model does not exist at that path or some expected files are missing.
Rather than check in translations of the dataset into this repo I think it's preferable to upload the translation to Hugging Face. We can add a note in the README...
I don't have a good explanation for why it jumps on your own checkpoint but not the pretrained GPTJ-6B. I have noticed this type of behavior before when training and...
>it always contain my original prompt which I don't need The standard `text-generation` pipeline will output the original prompt with the completion appended. From your code above it appears this...
>sometimes it can have many repeated sentences/phrases It's hard to know what could be causing this. It may have to do with the custom dataset. You can also try using...
>takes about 30s to generate a response on p3dn.24xlarge, which seems too long and not normal? I haven't used that particular machine so I'm not sure what would be normal...
Have you tried restarting the machine? Maybe restarting the kernel isn’t enough. Since it worked the first time before restarting the kernel it seems there isn’t an issue with the...
I've fixed the dataset issue.