ng-video-lecture
ng-video-lecture copied to clipboard
gpt.py how to save the model after training and how to use it so that it returns the text to me relevant to ChatGPT?
I have familiarized myself with the course gpt.py in principle, everything is clear with the training data, I have prepared a dataset. However, I want to save the resulting gpt model and then connect to it, insert some text into it and see how it will respond to it
I have familiarized myself with the course gpt.py in principle, everything is clear with the training data, I have prepared a dataset. However, I want to save the resulting gpt model and then connect to it, insert some text into it and see how it will respond to it
As Andrej mentioned in the video, this is a decoder only transformer. It will not respond based on condition since the architecture was not constructed as such. It will need an encoder part in the model that could be later used to condition like Q&A
I have familiarized myself with the course gpt.py in principle, everything is clear with the training data, I have prepared a dataset. However, I want to save the resulting gpt model and then connect to it, insert some text into it and see how it will respond to it
To use torch.save()
to save the model and optimizer's state dict, and torch.load()
to load them.
Example: torch.save(model.state_dict, 'params.pt)
and do the same for the optimizer.
I have familiarized myself with the course gpt.py in principle, everything is clear with the training data, I have prepared a dataset. However, I want to save the resulting gpt model and then connect to it, insert some text into it and see how it will respond to it
As Andrej mentioned in the video, this is a decoder only transformer. It will not respond based on condition since the architecture was not constructed as such. It will need an encoder part in the model that could be later used to condition like Q&A
This isn't entirely accurate. ChatGPT is a decoder-only model but that just means that it's different from encoder-only models such as BERT and seq2seq style models. They do not actually need a decoder to perform their function. To say that they need an encoder isn't correct because the input text that you provide to a decoder-only LLM is the starting point for the continuous generation of subsequent tokens, as Andrej showed in the video when he said that each batch has within in several cases based on how many tokens have been sent and what the next token to predict is.
It's confusing, certainly, but I just wanted to point out that if this is trained correctly, it can become a very small version of ChatGPT without any serious modification aside from scaling up the Blocks.
Train on HuggingFace's OpenOrca, add special tokens like <|imuser|> and <|imassistant|> But make sure to not calculate the loss for the user generations and only the assistant generations.
Actually, this model is pretty small. You'll need to bump up the hyperparameters. Use a more meaningful sub-word tokenization technique like Byte Pair Encoding etc., Train the model on a good text dataset. But the most important step to have a conversational model is to fine tune the model on text conversations (Question/Query and Answer/Response).
The model will also need to know when to stop.