nanoGPT
nanoGPT copied to clipboard
Add some opinionated guide for fine-tuning
It could be interesting to have some strong opinionated guide from the author addressing some typical issues:
- The need or not of freezing some layers while fine tuning, and which ones.
- The need or not of freezing some tokens while fine tuning. This means to freeze some part of the embedding matrix; it all the tokens are new, then it is just soft prompting.
- The weight to be given to prompt tokens when calculating the loss. The OpenAI API currently uses 0.01 of the weight of completion tokens. In most other libraries, it is just one or zero (via masking).