FastChat
FastChat copied to clipboard
How to implement weight decay towards the pre-trained model?
Hello, let me one question.
If using FastChat for supervised fune-tuning, how do I implement penalizing the distance between starting and current weights? This was shown to be effective in https://arxiv.org/abs/1706.03610