nanoGPT
nanoGPT copied to clipboard
Is it possible: davinci-003?
Can this approach be used to create a nano-sized text-davinci-003?
with same wonder.
No. Not as is. There are two major stages to training these: the pretraining stage and the finetuning stage. This code does the former. The finetuning stage requires additional custom data and further training, either by simple finetuning or something like RLHF. But somehow you have to finetune it to actually follow instructions, otherwise it's more of a document completor when it comes out of pretraining.
If you mean gpt-3 level, You're several billion parameters short. If you mean ChatGPT, then you need RLHF and finetuning on conversational data. The best I could get was using gpt 2 medium for pretrain stage, and finetuning on conversational data, here, working on RLHF
No. Not as is. There are two major stages to training these: the pretraining stage and the finetuning stage. This code does the former. The finetuning stage requires additional custom data and further training, either by simple finetuning or something like RLHF. But somehow you have to finetune it to actually follow instructions, otherwise it's more of a document completor when it comes out of pretraining.
Thanks for your reply @karpathy . Also I would like to get some clear advice from you to build an LLM.
Please correct me if my approach is wrong anywhere.
Step1: collect the data samples tiny Shakespeare or open web data. Step2: prepare the data Step3: train the model by using your script. Step4: evaluate and stop iteration once the probability prediction is human readable. Step5: save the weights. Step6: collect instruction dataset like "databricks-dolly-15k is an open source dataset of instruction-following records" Step7: fine-tune the above model with instruction dataset Step8: preparing the prompt with instruction, context, and question. Step9: use above prepared prompt to get predictions.
Is this correct?
Also, I have fees questions,
As I'm planning to build code LLM for R programming language. I would like to get some clarity from you.
In your tutorial, you have used tiny Shakespeare to train a model. Can I use the same tiny Shakespeare to train a model in the beginning and prepare instructions dataset for R programming to fine-tune it?
Or
Should I use the dataset which contains R programmings to train a model in the beginning and fine-tune it with a R programming instruction dataset?
Could you please guide me here? Sorry I know I have asked you so many questions. But, it would be helpful.
Thanks again 😊