lightfm icon indicating copy to clipboard operation
lightfm copied to clipboard

No user feature matrix defined but still get ValueError: The user feature matrix specifies more features than there are estimated feature embeddings

Open syl-kim opened this issue 3 years ago • 1 comments

I've seen a few posts that mentioned this error but I haven't been able to find a solution that resolves my case. I'd really appreciate any tip, advice, or direction as to how to resolve the error. I do not have a user feature matrix for my model, just an item feature matrix however when I try model.predict (on a single user id), I get the error: ValueError: The user feature matrix specifies more features than there are estimated feature embeddings.

I'm using data from Steam's API and this is what I have so far:

dataset = Dataset() dataset.fit((x['author.steamid'] for x in get_ratings()), (x['appID'] for x in get_ratings()))

dataset.fit_partial(items=(x['appID'] for x in get_game_features()), item_features=(x['game_topic'] for x in get_game_features()))

(interactions, weights) = dataset.build_interactions(((x['author.steamid'], x['appID']) for x in get_ratings())) print(repr(interactions))

<2873901x74 sparse matrix of type '<class 'numpy.int32'>' with 3973496 stored elements in COOrdinate format>

item_features = dataset.build_item_features(((x['appID'], [x['game_topic']]) for x in get_game_features())) print(repr(item_features))

<74x93 sparse matrix of type '<class 'numpy.float32'>' with 148 stored elements in Compressed Sparse Row format>

model = LightFM(loss='warp')

(train, test) = random_train_test_split(interactions=interactions, test_percentage=0.2)

model.fit(train, item_features=item_features, epochs=2)

And then I try to predict for an individual id and get the error:

model.predict(np.int64(76561198360721908),np.arange(interactions.shape[1]),user_features=None,item_features=item_features) ValueError Traceback (most recent call last) in ----> 1 model.predict(np.int64(76561198360721908),np.arange(interactions.shape[1]),user_features=None,item_features=item_features) ~\Anaconda3\lib\site-packages\lightfm\lightfm.py in predict(self, user_ids, item_ids, item_features, user_features, num_threads) 714 715 (user_features, --> 716 item_features) = self._construct_feature_matrices(n_users, 717 n_items, 718 user_features, ~\Anaconda3\lib\site-packages\lightfm\lightfm.py in _construct_feature_matrices(self, n_users, n_items, user_features, item_features) 303 if self.user_embeddings is not None: 304 if not self.user_embeddings.shape[0] >= user_features.shape[1]: --> 305 raise ValueError('The user feature matrix specifies more ' 306 'features than there are estimated ' 307 'feature embeddings: {} vs {}.'.format(

ValueError: The user feature matrix specifies more features than there are estimated feature embeddings: 2873901 vs 400456181.

Thanks in advance!

syl-kim avatar Oct 05 '20 00:10 syl-kim

You should pass the index of user_id and the index of item_id inside the predict() method.
You can see these indexes (they are mapped) doing: dataset.mapping(). This will return a tuple of the form:
(user id map, user feature map, item id map, item feature map).
https://making.lyst.com/lightfm/docs/lightfm.data.html?highlight=mapping#lightfm.data.Dataset.mapping

igorkf avatar Nov 17 '20 04:11 igorkf