keyphrase-generation-rl icon indicating copy to clipboard operation
keyphrase-generation-rl copied to clipboard

about training rl model

Open sjchasel opened this issue 2 years ago • 7 comments

I have trained catSeq model and its performance is as your reported. When I use python3 train.py -data data/kp20k/kp20k_separated/rl/ -vocab data/kp20k/kp20k_separated/rl/ -exp_path=exp -exp catSeq_rl_kp20k -epochs 20 -model_path=model/catSeq_rl_9527 -copy_attention -train_rl -one2many -one2many_mode 1 -batch_size 32 -separate_present_absent -pretrained_model=model/catSeq_9527/catSeq_kp20k.ml.one2many.cat.copy.bi-directional.epoch=3.batch=38098.total_batch=120000.model -max_length 60 -baseline self -reward_type 7 -replace_unk -topk G -seed=9527 to train a rl model, its loss is wrong from the beginning, and it always -0.000x. What do you think might be the problem?

sjchasel avatar Aug 14 '22 11:08 sjchasel

I find that this is because q value is very small, almost zero. Is that normal?

sjchasel avatar Aug 14 '22 11:08 sjchasel

I remembered that the loss is very small from the beginning.

kenchan0226 avatar Aug 16 '22 02:08 kenchan0226

I remembered that the loss is very small from the beginning.

I turned off the early stop, otherwise it would stop after four checkpoints. Now I have trained 4 epoch, but the loss is still -0.0001. Should I wait for it to train for 20 epochs? I don't know if this is normal for reinforcement learning models right now. How many epochs did you train your reinforcement learning model? Or if you have any trained rl models, could you share them? Thank you.

sjchasel avatar Aug 16 '22 08:08 sjchasel

Sorry I do not have any pre-trained models since it was more than three years. I remembered that the best checkpoint is usually located at the 3rd or 4th epoch. So it is reasonable that the training scripts stop at the 4th epoch. I think it is normal to have a small loss in my RL training code.

kenchan0226 avatar Aug 17 '22 21:08 kenchan0226

can i ask some questions

Struggle-lsl avatar Nov 12 '22 02:11 Struggle-lsl

why the precition is all

Struggle-lsl avatar Nov 13 '22 08:11 Struggle-lsl

Struggle-lsl avatar Nov 13 '22 08:11 Struggle-lsl