c-swm Implementation of hinge loss does not match the paper

Hello Kipf, I find there is a discrepancy between the loss mentioned in the paper.

According to Eq(5) in paper, for negative samples, you calculate the Euclidean distance between negative state sample at timestep t and state at timestep t+1.

However, in the code below, state and neg_state are both at timestep t.

self.neg_loss = torch.max(
    zeros, self.hinge - self.energy(
        state, action, neg_state, no_trans=True)).mean()

I noticed that the same question was also asked here.

I want to know if this is a bug ? Does the discrepancy affect the final performance ?

Feb 07 '21 10:02 Yingdong-Hu

Thanks for your comment. This is indeed a (small) difference and it should not affect performance in any significant way. I expect that one could show that both formulations are equivalent for a system where the probability of visiting any state in the MDP is the same (in expectation) -- which should be the case for the randomly initialized block pushing tasks and the physics simulation). For the Atari games this assumption is violated however and there could be (minor) differences in performance between either of the two implementations, but I would expect them to be smaller than the variance you get from the parameter initialization. If you happen to test this experimentally, please share your results here :)

On Sun, Feb 7, 2021 at 11:17 AM Alex [email protected] wrote:

Hello Kipf, I find there is a discrepancy between the loss mentioned in the paper.

According to Eq(5) in paper, for negative samples, you calculate the Euclidean distance between negative state sample at timestep t and state at timestep t+1.

However, in the code below, state and neg_state are both at timestep t.

self.neg_loss = torch.max( zeros, self.hinge - self.energy( state, action, neg_state, no_trans=True)).mean()

I noticed that the same question was also asked here https://github.com/tkipf/c-swm/issues/3.

I want to know if this is a bug ? Does the discrepancy affect the final performance ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tkipf/c-swm/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYBYYCJMHLDCPCGIYKGDL3S5ZSCBANCNFSM4XHJKOYQ .

Feb 07 '21 15:02 tkipf

Hi, thanks for your answer. I understand that that both formulations are equivalent. But I have another question: For the Atari games, even if I set the same random seed, multiple executions of the application behave differently. According to my understanding, when I set the same seed, multiple calls to the program will produce the same result. Are there other nondeterministic operations in the code that cause the randomness ?

Feb 24 '21 03:02 Yingdong-Hu

Yes, I believe either the frame_skip between the individual actions (or for how many frames an action is applied) is not fully deterministic in the gym environment that our codebase uses. If you happen to have more insight into this, please feel free to share it here.

On Wed, Feb 24, 2021 at 4:34 AM Alex [email protected] wrote:

Hi, thanks for your answer. I now understand that that both formulations are equivalent. But I have another question: For the Atari games, even if I set the same random seed, multiple executions of the application behave differently. According to my understanding, when I set the same seed, multiple calls to the program will produce the same result. Are there other nondeterministic operations in the code that cause the randomness ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tkipf/c-swm/issues/7#issuecomment-784727128, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYBYYCQ3FYYXDUHQ75IVWLTARXWDANCNFSM4XHJKOYQ .

Feb 24 '21 11:02 tkipf

c-swm c-swm copied to clipboard

Implementation of hinge loss does not match the paper

c-swm
c-swm copied to clipboard