self-rewarding-lm-pytorch
self-rewarding-lm-pytorch copied to clipboard
What changes should I make to apply the method on Llama2?
I want to apply Self-rewarding and SPIN method on llama2 with alpaca-like finetuning datasets. What changes should I make to apply the method? And what config should I use? Thanks a lot!