Generic Flan-T5 RL training
Hello,
I tried to use this library to train a FLAN-T5. I have seen that as examples there are sentiment analysis or summarization tasks, but in my case it is a generic generation task. Trying to use my dataset, I tried the config: ppo_config_cnn_daily.yml.
The result I got is this, and it seems to be a summarization task:
Prompt: "[some text] ... from 1978, excluding finishing years incoming 62"
Output: "years incoming 62 incoming 62 incoming 62 incoming 62 incoming 62 incoming 62 incoming 62 incoming 62 incoming 62 incoming"
And I'm using this code with my personal prompts:
trlx.train(
reward_fn=reward_fn,
prompts=train_prompts,
eval_prompts=valid_prompts,
config=config
)
I have two problems, maybe caused by summarization task:
- First the repetition at the end
- Second, the expected output is very different
Should I change the code or are there generic examples? Thanks
Can you provide the dataset or code that you used? You can try to change gen_kwargs in the config with summarization I used greedy sampling to evaluate, but I am not sure it will work for your task.
@PhungVanDuy I'm using the standard configuration of ppo_config_cnn_daily.yml (the pipeline or trainer in this file).
As code i'm using this code (the only difference is that I'm using my own dataset).
As personal dataset and standard generation task, an expected output can be something like this:
Prompt: "[some text] ... from 1978, excluding finishing years incoming 62"
Output: "year: '1978', incoming: '62'"
@PhungVanDuy I'm using the standard configuration of
ppo_config_cnn_daily.yml(the pipeline or trainer in this file). As code i'm using this code (the only difference is that I'm using my own dataset).As personal dataset and standard generation task, an expected output can be something like this:
Prompt: "[some text] ... from 1978, excluding finishing years incoming 62" Output: "year: '1978', incoming: '62'"
Can you try to adjust gen_kwargs, reduce the number of new tokens add sampling with low temperature?