MojiTalk
MojiTalk copied to clipboard
failed to replicate the result of top1, top5 acc for CVAE model
Hi,
I am trying to replicate your chatbot model. I did exactly the same thing as you mentioned in the README. I got more or less similar kl/reconstruction loss for training CVAE. But for emoji classification, top1 and top5 acc for CVAE model is only 30.4% and 54.3%, which is much worse than the results you reported in the paper. Can you give some suggestions about this?
Thanks Peng
what is the performance of your Base model? I suggest choosing the breakpoint of a not yet converged Base model as the pretrained model. And train your CVAE model starting from there. Also, you may need to stop the training of the emoji classifier before it overfits.
Thanks for your reply. The base model achieves perplexity of 134.244/132.922 at step 18000/27500 on test set. And I choose the breakpoint at step 18000 for the starting point for CVAE training. After CVAE model training, I got recon/kl loss as 42.426/26.412. For the emoji classifier, the best model is 'step': 9000, 'epoch': 2, 'accuracy': 0.3211703300476074, 'loss': 2.8405072689056396, 'top_5_accuracy': 0.5782781839370728 on the test set, which is same as what you reported.
I got the acc result on top1/top5 as 0.304/0.543 for CVAE model from the rl_run.py as you will get the test set performance before training. Any ideas about the problem?
Best Peng
This looks like the result of the Base model. Have you tried printing the acc results of your Base model?
Thanks for your comment. I will try the base model and let you know the result.
Sorry for the late reply.
I ran the base model and the top1/top5 accuracy is 0.349/0.575, which is similar to what you reported in the paper. However, the CVAE model is even worse than the baseline seq2seq.
Please let me know your opinion.
Thanks! Peng
Hi! I got the same accuracy problem as you discussed here when replicating the model and i am still confused why.If you guys could give me some advice i will be very appreciated.
@claude-zhou
Dear zhou
I attempt to implements your paper with pytorch
when I read your code,I found that the output you used in calculating the test loss(perplexity) was the output of training decoder but not inference decoder.
As you mentioned in the code "use inference decoder's logits to compute recon_loss"
but the logits was the output of training decoder not inference decoders'
shouldn't we use the output of inference decoder to calculate the test loss(perplexity)?
with tf.variable_scope("loss"): max_time = tf.shape(self.rep_output)[0] with tf.variable_scope("reconstruction"): # TODO: use inference decoder's logits to compute recon_loss cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits( # ce = [len, batch_size] labels=self.rep_output, logits=self.logits) # rep: [len, batch_size]; logits: [len, batch_size, vocab_size] target_mask = tf.sequence_mask( self.rep_len + 1, max_time, dtype=self.logits.dtype) # time_major target_mask_t = tf.transpose(target_mask) # max_len batch_size self.recon_losses = tf.reduce_sum(cross_entropy * target_mask_t, axis=0) self.recon_loss = tf.reduce_sum(cross_entropy * target_mask_t) / batch_size
# Dynamic decoding infer_outputs, _, infer_lengths = seq2seq.dynamic_decode( decoder, maximum_iterations=maximum_iterations, output_time_major=True, swap_memory=True, scope=decoder_scope ) if beam_width > 0: self.result = infer_outputs.predicted_ids else: self.result = infer_outputs.sample_id self.result_lengths = infer_lengths
# Dynamic decoding infer_outputs, _, infer_lengths = seq2seq.dynamic_decode( decoder, maximum_iterations=maximum_iterations, output_time_major=True, swap_memory=True, scope=decoder_scope ) if beam_width > 0: self.result = infer_outputs.predicted_ids else: self.result = infer_outputs.sample_id self.result_lengths = infer_lengths
I am thinking that we should use "infer_outputs" to calculate the CrossEntropy and calculate the perplexity further
I will be appreciate it very much if you could give some advice @claude-zhou