rl-prompt
rl-prompt copied to clipboard
Repeating tokens in optimized prompt
Hello there,
I am working on an application of your work in another setting that is not related to text style transfer or classification. During evaluation, the model almost always gives repeating tokens like ['Private', 'Private', 'Private', 'Private', 'Private', 'Private']
or ['Policy', 'Policy', 'Policy', 'Policy', 'Policy', 'Policy']
. How can I improve on the performance model? I'd love to get your expert insights on important hyperparameters I can play with to achieve better results.