mxnet-notebooks
mxnet-notebooks copied to clipboard
deep matrix factorization question
The code currently looks like this:
def get_one_layer_mlp(hidden, k):
# input
user = mx.symbol.Variable('user')
item = mx.symbol.Variable('item')
score = mx.symbol.Variable('score')
# user latent features
user = mx.symbol.Embedding(data = user, input_dim = max_user, output_dim = k)
user = mx.symbol.Activation(data = user, act_type="relu")
user = mx.symbol.FullyConnected(data = user, num_hidden = hidden)
# item latent features
item = mx.symbol.Embedding(data = item, input_dim = max_item, output_dim = k)
item = mx.symbol.Activation(data = item, act_type="relu")
item = mx.symbol.FullyConnected(data = item, num_hidden = hidden)
# predict by the inner product
pred = user * item
pred = mx.symbol.sum_axis(data = pred, axis = 1)
pred = mx.symbol.Flatten(data = pred)
# loss layer
pred = mx.symbol.LinearRegressionOutput(data = pred, label = score)
return pred
My understanding is that the embedding layer should be able to learn anything that having a single dense layer on top of it could learn, since it can be basically anything. I had thought a deep matrix factorization would look something more like this:
def get_one_layer_mlp(hidden, k):
# input
user = mx.symbol.Variable('user')
item = mx.symbol.Variable('item')
score = mx.symbol.Variable('score')
# user latent features
user = mx.symbol.Embedding(data = user, input_dim = max_user, output_dim = k)
# item latent features
item = mx.symbol.Embedding(data = item, input_dim = max_item, output_dim = k)
# predict by the inner product
pred = mx.symbol.Concat([user, item])
pred = mx.symbol.FullyConnected(data = pred, num_hidden = hidden)
pred = mx.symbol.Activation(data = pred, act_type="relu")
pred = mx.symbol.FullyConnected(data = pred, num_hidden = 1)
# loss layer
pred = mx.symbol.LinearRegressionOutput(data = pred, label = score)
return pred
Basically, the layers should take a concatenation of the latent variables and have layers on top of that, instead of having layers on top of the embedding layers.
the second one also makes sense. but you can think the first one is special case of the second one, the former uses a fullyc on both user
and pred
with a block structure.