PreSumm
PreSumm copied to clipboard
the candidate results of all the samples are the same
hello! first thanks for your contribution. when I try to test test the BertSumAbs .the cmd is :
python train.py -task abs -mode test -batch_size 30 -test_batch_size 5 -bert_data_path ../bert_data_cnndm_final/cnndm -log_file ../logs/val_abs_bert_cnndm_eng -model_path ../models/abs_trans_eng/ -sep_optim true -use_interval true -visible_gpus 0 -max_pos 512 -max_length 200 -alpha 0.95 -min_length 50 -result_path ../results/abs_bert_cnndm_eng/ -test_from ../models/abs_trans_eng/model_step_200000.pt
i got the wrong candidate file like this : for example : PreSumm-master/results/abs_bert_cnndm_eng/.200000.candidate: new : new : : : new york 's the u.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s new : new : : : new york 's the u.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s new : new : : : new york 's the u.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s new : new : : : new york 's the u.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s new : new : : : new york 's the u.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s new : new : : : new york 's the u.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s new : new : : : new york 's the u.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s new : new : : : new york 's the u.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s new : new : : : new york 's the u.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s new : new : : : new york 's the u.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s new : new : : : new york 's the u.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s new : new : : : new york 's the u.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s new : new : : : new york 's the u.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s new : new : : : new york 's the u.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s
It seems like all the test samples got the same predication result and then the result is so bed. but the file both ".200000.gold" and ".200000.raw_src" are correct is there anything wrong ?
mine also getting the same results. All output are same
me too.
Please paste your training commands here
python3 train.py -task abs -mode train -bert_data_path bert_data/ -dec_dropout 0.2 -model_path model_abs/ -sep_optim true -lr_bert 0.002 -lr_dec 0.2 -save_checkpoint_steps 2000 -batch_size 140 -train_steps 200000 -report_every 50 -accum_count 5 -use_bert_emb true -use_interval true -warmup_steps_bert 20000 -warmup_steps_dec 10000 -max_pos 512 -visible_gpus 0 -log_file abs_bert_cnndm
With only 1 gpu for training, you need to accumulate the gradient for a much larger step, or the model cannot be trained effectively.
SIr, But I have only one gpu. Can't the training be effective on that.
You can use our Trained Models.
With only 1 gpu for training, you need to accumulate the gradient for a much larger step, or the model cannot be trained effectively.
so you need to have how many gpu for training??
For extractive summarization, the author trained the model on 3 GPU.
For abstractive summarization, the author trained the model on 4 GPU for 2 days.
I have faced the same repeating issues when training korean model. After some amount of research, I have found that this is a general problem for natural language generation and known as degeneration.
I have added an extra module for the decoder, replacing beam search.
Please let me know if anyone is interested in it
Thanks
@robinsongh381 We are interested !
So you replaced beam search and got better results ?
@Colanim Sorry for late reply I have replace the decoder with the following method which proposes a new way of sampling tokens at decoding steps, rather than just depending on beam search.
The paper has suggested two methods and they are implemented on here
From my experience, I could avoid the repetition issue with the proposed method and hence improve the ROUGE score !
Hope my opinion helps
Thanks for the message !
Do you remember (approximately) how big is the difference in ROUGE score ?
@Colanim Sorry for late reply I have replace the decoder with the following method which proposes a new way of sampling tokens at decoding steps, rather than just depending on beam search.
The paper has suggested two methods and they are implemented on here
From my experience, I could avoid the repetition issue with the proposed method and hence improve the ROUGE score !
Hope my opinion helps
can share your results ??
Hi. I am using the pre-trained models for testing on CNN dataset. This is the command I am giving: python train.py -task abs -mode test -test_from ~/Downloads/cnndm_baseline_best.pt -batch_size 3000 -test_batch_size 500 -bert_data_path ../bert_data/test -log_file ../logs/val_abs_bert_cnndm -sep_optim true -use_interval true -visible_gpus 0 -max_pos 512 -max_length 200 -alpha 0.95 -min_length 50 -result_path ../logs/abs_bert_cnndm
But my candidate result is same as the very first single file i used. After that it isn't changing. What am i doing wrong?
while training on 1 gpu we need to set the grad accum count greater than 5, u said. how much should that be? Please help.
With only 1 gpu for training, you need to accumulate the gradient for a much larger step, or the model cannot be trained effectively.
python3 train.py -task abs -mode train -bert_data_path bert_data/ -dec_dropout 0.2 -model_path model_abs/ -sep_optim true -lr_bert 0.002 -lr_dec 0.2 -save_checkpoint_steps 2000 -batch_size 140 -train_steps 200000 -report_every 50 -accum_count 5 -use_bert_emb true -use_interval true -warmup_steps_bert 20000 -warmup_steps_dec 10000 -max_pos 512 -visible_gpus 0 -log_file abs_bert_cnndm
Using a single gpu generation of sentence results, do you solve the problem?