TextRL icon indicating copy to clipboard operation
TextRL copied to clipboard

unfreeze_layer_from_past parameter

Open JhonDan1999 opened this issue 9 months ago • 4 comments

Nice repo!!!

it seems that the default parameter for the policy will freeze all the layers of the language model we are using and just update the lm_head I tried the provided example of flan-T5 here: https://colab.research.google.com/drive/1DYHt0mi6cyl8ZTMJEkMNpsSZCCvR4jM1?usp=sharing

when I changed the value unfreeze_layer_from_past to be 1 to update the wights of the final layer of flan-t5 like this: Screenshot 2023-09-20 at 1 04 45 PM

the behavior change the the actor starts to generate empty text: Screenshot 2023-09-20 at 1 08 58 PM

Also after training it gave me empty text:

Screenshot 2023-09-20 at 1 09 50 PM

what is the reason of the this behavior?

NOTE: I did not change anything else in the flan-t5 code example.

JhonDan1999 avatar Sep 20 '23 10:09 JhonDan1999

I observed the same thing, I also tried penalizing the fact of generating the '_' token directly in the reward function. Unfortunately, it does not seems to learn how to stop generating the blank token...

barthelemymp avatar Sep 22 '23 18:09 barthelemymp

Hi all, the issue probably cause by https://github.com/huggingface/transformers/blob/bffac926ca6bc6c965a92bfbfd00c567a2c0fb90/src/transformers/models/t5/modeling_t5.py#L1147C8-L1147C8

it will add a position_bias after each layer output, so the initialize model will perform badly

voidful avatar Sep 28 '23 02:09 voidful

Hey! Do you guys figure out a solution to this problem? Thanks!

daniellucs2002 avatar Jan 18 '24 02:01 daniellucs2002

Hey! Do you guys figure out a solution to this problem? Thanks!

Unfortunately not yet, I spend a lot of time trying to figure out a way to do it with this library but I ended up leaving it (at least currently)

JhonDan1999 avatar Jan 18 '24 05:01 JhonDan1999