RLSeq2Seq cannot reproduct the reuslt in pointer-generator with coverage mechanism, always inferior to pgen model.

My batch_size is 64, I pretrain my model for about 50000 iterations, and get a better result than pgen`s. Then I turn on the coverage mechanism, and train the model with another 2000 iterations. The coverage loss cannot decrease to 0.2 which has been mentioned in pgen model. The final result on rouge-1 metric is about 38.90. Is there any tricks to add coverage mechanism? How can I get the similar result with pgen model ?

Dec 13 '18 10:12 gm0616

No, this issue is well-discussed in the original pointer-generator model page. Every time you run this model, it will generate a different result due to the multi-processing batching used in this model. The only solution that I usually use for fixing my model parameters is to use 1 queue for batching and make sure to use seed for randomizers throughout the framework. Try setting these parameters to 1: example_queue_threads batch_queue_threads

If you vary the seed parameter, you might manage to get even better result than the original paper. I've got better result myself as presented in our latest paper.

My personal experience is that the running average loss (at least the way it is defined in this paper) is not the best indicator for selecting the best evaluation model. In the above paper, I'm using the average ROUGE reward during evals as another way of saving my best model and it sometimes work better the running average loss.

Dec 13 '18 23:12 yaserkl

Well, thanks for your response, I`ll try the methods you have mentioned above to manage with coverage mechanism.
You said you use ROUGE reward during evals. As far as I know, the calculation of ROUGE is quite slow, how to implement this metric to evaluate a certain ckpt? And which ROUGE metric you use for evaluation? 1, 2, or L ?

Dec 18 '18 07:12 gm0616

Yes, it's quite slow and will increase the evaluation time per batch by two to three times (without ROUGE based eval, each evaluation will take around 0.5 sec on a P100 GPU with batch size 8, but with ROUGE it rise up to 1.5 secs which is still fine for my case). Also, I'm using ROUGE L to get the best training ckpt.

Dec 25 '18 17:12 yaserkl

No, this issue is well-discussed in the original pointer-generator model page. Every time you run this model, it will generate a different result due to the multi-processing batching used in this model. The only solution that I usually use for fixing my model parameters is to use 1 queue for batching and make sure to use seed for randomizers throughout the framework. Try setting these parameters to 1: example_queue_threads batch_queue_threads

If you vary the seed parameter, you might manage to get even better result than the original paper. I've got better result myself as presented in our latest paper.

My personal experience is that the running average loss (at least the way it is defined in this paper) is not the best indicator for selecting the best evaluation model. In the above paper, I'm using the average ROUGE reward during evals as another way of saving my best model and it sometimes work better the running average loss.

Excuse me, When evaluating, is it necessary to add the train_operation to run? The function def run_train_steps() is: to_return = { 'train_op': self._shared_train_op, 'summaries': self._summaries, 'pgen_loss': self._pgen_loss, 'global_step': self.global_step, 'decoder_outputs': self.decoder_outputs } However, def run_eval_steps() is: to_return = { 'summaries': self._summaries, 'pgen_loss': self._pgen_loss, 'global_step': self.global_step, 'decoder_outputs': self.decoder_outputs } When i ran eval steps, the model did not update and the average loss kept same. Is anything wrong in my running process? Expect for replying, thank you very much.

Jul 01 '19 10:07 xiangriconglin

RLSeq2Seq RLSeq2Seq copied to clipboard

cannot reproduct the reuslt in pointer-generator with coverage mechanism, always inferior to pgen model.

RLSeq2Seq
RLSeq2Seq copied to clipboard