glimpse_clouds
glimpse_clouds copied to clipboard
Somethings look different with the paper.
It seems there are a few things that are implemented different with the paper.
- About encouraging diversity of attention points
dist_btw_glimpses = mse_loss(attention_points_1, attention_points_2)
The above value, dist_btw_glimpses is just summed up with the weight alpha_encourage_diversity=1.0.
I think this will discourage diversity rather than encouraging.
-
About memory bank and its slots It seems there is no operation for deleting old slots and the memory is not limited to K.
-
D is trained end-to-end, but not saved. The paper says
After pre-training, D is trained end-to-end.
On codes, It seems that D is trained but not saved. And re-calculated at first batch time whenever model reloaded. So It seems that it is pointless to train D.
Hi @Haetsal-Lee,
First of all thanks for the interest in our work. Below are my answers point by point:
- I am confused about this point and I think that you are right the function should returned
1/(1+dist_btw_glimpses)
instead of returningdist_btw_glimpses
(cf L_G_1 in the paper in Eq. 15). - Since we are dealing with short-sequences (8 frames only) we are not deleting the old slots. So the memory slot is limited to K=8 in our case.
- Indeed I have just checked the checkpoint and it seems that D is not part of the model I am sorry for that. For my experiments I was running experiments on the val/test set right after training without saving the model into a checkpoint so I was using D trained on the training set. I apologize for this issue I will update the checkpoint as soon as possible.
Is the issue related with the checkpoint planned to be updated?
Hi @HochulHwang , for the moment I will not be able to update the checkpoint. I keep this issue open. I will be happy if you feel free to solve the issue.
Hope anybody can solve this issue soon