MojiTalk icon indicating copy to clipboard operation
MojiTalk copied to clipboard

failed to replicate the result of top1, top5 acc for CVAE model

Open xp1992slz opened this issue 5 years ago • 10 comments

Hi,

I am trying to replicate your chatbot model. I did exactly the same thing as you mentioned in the README. I got more or less similar kl/reconstruction loss for training CVAE. But for emoji classification, top1 and top5 acc for CVAE model is only 30.4% and 54.3%, which is much worse than the results you reported in the paper. Can you give some suggestions about this?

Thanks Peng

xp1992slz avatar Apr 16 '19 10:04 xp1992slz

what is the performance of your Base model? I suggest choosing the breakpoint of a not yet converged Base model as the pretrained model. And train your CVAE model starting from there. Also, you may need to stop the training of the emoji classifier before it overfits.

claude-zhou avatar Apr 16 '19 17:04 claude-zhou

Thanks for your reply. The base model achieves perplexity of 134.244/132.922 at step 18000/27500 on test set. And I choose the breakpoint at step 18000 for the starting point for CVAE training. After CVAE model training, I got recon/kl loss as 42.426/26.412. For the emoji classifier, the best model is 'step': 9000, 'epoch': 2, 'accuracy': 0.3211703300476074, 'loss': 2.8405072689056396, 'top_5_accuracy': 0.5782781839370728 on the test set, which is same as what you reported.

I got the acc result on top1/top5 as 0.304/0.543 for CVAE model from the rl_run.py as you will get the test set performance before training. Any ideas about the problem?

Best Peng

xp1992slz avatar Apr 17 '19 07:04 xp1992slz

This looks like the result of the Base model. Have you tried printing the acc results of your Base model?

claude-zhou avatar Apr 18 '19 02:04 claude-zhou

Thanks for your comment. I will try the base model and let you know the result.

xp1992slz avatar Apr 22 '19 22:04 xp1992slz

Sorry for the late reply.

I ran the base model and the top1/top5 accuracy is 0.349/0.575, which is similar to what you reported in the paper. However, the CVAE model is even worse than the baseline seq2seq.

Please let me know your opinion.

Thanks! Peng

xp1992slz avatar May 05 '19 15:05 xp1992slz

Hi! I got the same accuracy problem as you discussed here when replicating the model and i am still confused why.If you guys could give me some advice i will be very appreciated.

Wardwarf-Li avatar May 17 '19 08:05 Wardwarf-Li

@claude-zhou Dear zhou I attempt to implements your paper with pytorch when I read your code,I found that the output you used in calculating the test loss(perplexity) was the output of training decoder but not inference decoder. As you mentioned in the code "use inference decoder's logits to compute recon_loss" but the logits was the output of training decoder not inference decoders' shouldn't we use the output of inference decoder to calculate the test loss(perplexity)?

with tf.variable_scope("loss"): max_time = tf.shape(self.rep_output)[0] with tf.variable_scope("reconstruction"): # TODO: use inference decoder's logits to compute recon_loss cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits( # ce = [len, batch_size] labels=self.rep_output, logits=self.logits) # rep: [len, batch_size]; logits: [len, batch_size, vocab_size] target_mask = tf.sequence_mask( self.rep_len + 1, max_time, dtype=self.logits.dtype) # time_major target_mask_t = tf.transpose(target_mask) # max_len batch_size self.recon_losses = tf.reduce_sum(cross_entropy * target_mask_t, axis=0) self.recon_loss = tf.reduce_sum(cross_entropy * target_mask_t) / batch_size # Dynamic decoding infer_outputs, _, infer_lengths = seq2seq.dynamic_decode( decoder, maximum_iterations=maximum_iterations, output_time_major=True, swap_memory=True, scope=decoder_scope ) if beam_width > 0: self.result = infer_outputs.predicted_ids else: self.result = infer_outputs.sample_id self.result_lengths = infer_lengths

KingS770234358 avatar Feb 27 '20 12:02 KingS770234358

# Dynamic decoding infer_outputs, _, infer_lengths = seq2seq.dynamic_decode( decoder, maximum_iterations=maximum_iterations, output_time_major=True, swap_memory=True, scope=decoder_scope ) if beam_width > 0: self.result = infer_outputs.predicted_ids else: self.result = infer_outputs.sample_id self.result_lengths = infer_lengths

KingS770234358 avatar Feb 27 '20 12:02 KingS770234358

I am thinking that we should use "infer_outputs" to calculate the CrossEntropy and calculate the perplexity further

KingS770234358 avatar Feb 27 '20 12:02 KingS770234358

I will be appreciate it very much if you could give some advice @claude-zhou

KingS770234358 avatar Feb 27 '20 12:02 KingS770234358