keyphrase-generation-rl about training rl model

I have trained catSeq model and its performance is as your reported. When I use python3 train.py -data data/kp20k/kp20k_separated/rl/ -vocab data/kp20k/kp20k_separated/rl/ -exp_path=exp -exp catSeq_rl_kp20k -epochs 20 -model_path=model/catSeq_rl_9527 -copy_attention -train_rl -one2many -one2many_mode 1 -batch_size 32 -separate_present_absent -pretrained_model=model/catSeq_9527/catSeq_kp20k.ml.one2many.cat.copy.bi-directional.epoch=3.batch=38098.total_batch=120000.model -max_length 60 -baseline self -reward_type 7 -replace_unk -topk G -seed=9527 to train a rl model, its loss is wrong from the beginning, and it always -0.000x. What do you think might be the problem？

Aug 14 '22 11:08 sjchasel

I find that this is because q value is very small, almost zero. Is that normal?

Aug 14 '22 11:08 sjchasel

I remembered that the loss is very small from the beginning.

Aug 16 '22 02:08 kenchan0226

I remembered that the loss is very small from the beginning.

I turned off the early stop, otherwise it would stop after four checkpoints. Now I have trained 4 epoch, but the loss is still -0.0001. Should I wait for it to train for 20 epochs? I don't know if this is normal for reinforcement learning models right now. How many epochs did you train your reinforcement learning model? Or if you have any trained rl models, could you share them? Thank you.

Aug 16 '22 08:08 sjchasel

Sorry I do not have any pre-trained models since it was more than three years. I remembered that the best checkpoint is usually located at the 3rd or 4th epoch. So it is reasonable that the training scripts stop at the 4th epoch. I think it is normal to have a small loss in my RL training code.

Aug 17 '22 21:08 kenchan0226

can i ask some questions

Nov 12 '22 02:11 Struggle-lsl

why the precition is all

Nov 13 '22 08:11 Struggle-lsl

keyphrase-generation-rl keyphrase-generation-rl copied to clipboard

about training rl model

keyphrase-generation-rl
keyphrase-generation-rl copied to clipboard