Maciej Kula
Maciej Kula
These losses maximize the difference between the dot products of the positive and (implicit) negative items, and so using the dot product for prediction is appropriate.
Are you trying to model a scenario where a user buys two or more items at the same time? You are right that this isn't supported here in a straightforward...
1. This is partially, but not entirely correct. 2. The position does not matter. Your features should look similar to the following: ``` [ (item1, 'price:1', 'accept_credit_cards:False', 'smoking_allowed:True', 'category:bar'), ]...
This should be correct. I am working on tests to make sure that this really is true. In principle the parameters of the optimizer will get serialized as well, so...
Certainly for Adagrad the learning rate goes to zero as the number of training examples gets large. I'm less sure this is true of Adam: I suspect if may converged...
Not really. While I think adagrad is a poor choice (learning rate goes to zero), I suspect Adam, SGD, and SGD with momentum will all work quite well in this...
The culprit here might be that the embeddings for all features of a given item are simply summed to get the final item embedding: the model does not seem to...
@FrancescoI the norm sounds like a very good approximation to feature importance. My hope was that this would work reliably, but I think in practice it's not always the case....
This is because the `ratings` attribute of your dataset has type `np.float64` instead of `np.float32`. I can offer two suggestions: - convert the ratings to `float32`: `dataset.ratings = dataset.ratings.astype(np.float32)` -...
It is the latest release, but the master version here on Github has some new improvements.