DKN There are no parameters for the MLP part in the paper.

There are no parameters for the MLP part in the paper.

Open jasonyanglu opened this issue 5 years ago • 2 comments

Dear authors,

I found that the code implementation is slightly different from the structure presented in the paper (Figure 3). In the paper, both the output of the attention and the final output are produced by a two-layer MLP. However, in the code dkn.py, they are implemented as:

attention_weights = tf.reduce_sum(clicked_embeddings * news_embeddings_expanded, axis=-1)

and

self.scores_unnormalized = tf.reduce_sum(user_embeddings * news_embeddings, axis=1)

They are only a inner product between two vectors and there are no any parameters to learn the concat. My question is:

Why can this work? Can only updating embeddings without MLP weights get well trained?
Why is the code implementation different from the paper?

Oct 18 '18 03:10 jasonyanglu

The inner product works according to the experiments. We can hardly say which one is always the better.
The code here has been refactored and simplified considering the computation overhead.

Oct 18 '18 09:10 hwwang55

However, if the attention network is just a simple "cosine similarity" between the user's clicked news and the candidate news, it cannot be called attention "network". Because it doesn't have any parameter can cannot learn anything. If the inner product can simply replace the attention network, it means that the attention network is meaningless.

Oct 18 '18 13:10 jasonyanglu

DKN DKN copied to clipboard

There are no parameters for the MLP part in the paper.

DKN
DKN copied to clipboard